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PREFACE 


The  real  payoff  for  artificial  intelligence  (AI)  is  applications.  It  is  applications  that  has  thrust 
AI  into  prominence  and  commercialization  in  the  1980’s.  This  report  presents  overviews  of  key 
application  areas:  Expert  Systems,  Computer  Vision,  Natural  Language  Processing,  Speech  In¬ 
terfaces,  and  Problem  Solving  and  Planning.  The  basic  approaches  to  such  systems,  the  state  of 
the  art,  existing  systems  and  future  trends  and  expectations  are  covered. 

It  is  anticipated  that  this  report  will  prove  useful  to  engineering  and  research  managers,  poten¬ 
tial  users  and  others  who  will  be  affected  by  the  rapidly  growing  area  of  AI  applications. 

This  report  is  part  of  the  NBS/NASA  series  of  overviews  on  AI  and  Robotics.  Due  to  the  scope 
of  AI,  Volume  I  —  Artificial  Intelligence  —  is  issued  in  three  parts  (this  report  being  Part  B): 

Part  A:  The  Core  Ingredients,  NASA  TM  85836,  June  1983 

I.  Artificial  Intelligence — What  It  Is 

II.  The  Rise,  Fall  and  Rebirth  of  AI 

III.  Basic  Elements  of  AI 

IV.  Applications 

V.  The  Principal  Participants 

VI.  State-of-the-Art 

VII.  Towards  the  Future 
Sources  for  Further  Information 
Glossary 

Part  B:  Applications,  NASA  TM  85838,  Sept.  1983 

I.  Expert  Systems 

II.  Computer  Vision 

III.  Natural  Language  Processing 

IV.  Speech  Recognition  and  Speech  Understanding 

V.  Speech  Synthesis 

VI.  Problem-Solving  and  Planning 

Part  C:  Basic  AI  Topics,  NASA  TM  85839,  Oct.  1983 

I.  Artificial  Intelligence  and  Automation 

II.  Search-Oriented  Automated  Problem  Solving  and  Planning 

III.  Knowledge  Representation 
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FOREWORD 


The  opening  of  the  decade  of  the  80’s  saw  Artificial  Intelligence  (AI)  transition  from  a  primar¬ 
ily  research  topic  to  commercial  applications.  The  full  impact  of  this  transition  has  yet  to  be  felt. 

AI  has  been  designated  by  the  U.S.  Defense  Science  Board  as  one  of  the  top  10  major  payoff 
areas  for  the  military.  It  has  been  made  the  core  ingredient  of  Japan’s  Fifth  Generation  computer 
research  project  by  which  they  seek  to  catapult  Japan  into  the  dominant  information  society  in 
the  1990’s.  Similar  importance  has  been  attached  to  AI  in  the  U.S.,  Great  Britain  and  France. 

This  report  summarizes  the  key  AI  application  areas  of  Expert  Systems,  Computer  Vision, 
Natural  Language  Processing,  Speech  Interfaces,  and  Problem  Solving  and  Planning.  More 
detailed  information  can  be  found  in  the  following  documents  available  from  the  National 
Technical  Information  Service  (NTIS),  Springfield,  VA  22161. 

An  Overview  of  Expert  Systems,  NBSIR  2505 
May  1982  (Revised  October  1982) 

An  Overview  of  Computer  Vision,  NBSIR  2582 
September  1982 

An  Overview  of  Natural  Language  Processing 
NBSIR  83-2687,  April  1983 
NASA  TM  85635,  April  1983 

Two  emerging  AI  topics  —  Automatic  Programming,  and  Machine  Learning  —  are  not  treated 
separately  in  this  report  but  are  included  under  Expert  Systems. 

This  document  is  Part  B  of  the  three  part  report: 

An  Overview  of  Artificial  Intelligence  and  Robotics 
Volume  I  —  Artificial  Intelligence 

Part  A  —  The  Core  Ingredients,  NASA  TM  85836,  June  1983 
Part  B  —  Applications,  NASA  TM  85838,  Sept.  1983 
Part  C  —  Basic  AI  Topics,  NASA  TM  85839,  Oct.  1983 

The  important  AI  application  areas  of  robotics  and  automated  manufacturing  are  treated  in 
An  Overview  of  Artificial  Intelligence  and  Robotics 

Volume  II  —  Robotics,  NBSIR  82-2479,  March  1982. 
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I.  EXPERT  SYSTEMS 


A.  Introduction 

Expert  Systems  is  probably  the  “hottest”  topic  in  Artificial  Intelligence  (AI)  today.  Prior  to  the 
last  decade,  in  trying  to  find  solutions  to  problems,  AI  researchers  tended  to  rely  on  non¬ 
knowledge-guided  search  techniques  or  computational  logic.  These  techniques  were  successfully 
used  to  solve  elementary  problems  or  very  well  structured  problems  such  as  games.  However,  real 
complex  problems  are  prone  to  have  the  characteristics  that  their  search  space  tends  to  expand  ex¬ 
ponentially  with  the  number  of  parameters  involved.  For  such  problems,  these  older  techniques 
have  generally  proved  to  be  inadequate  and  a  new  approach  was  needed.  This  new  approach  em¬ 
phasized  knowledge  rather  than  search  and  has  led  to  the  field  of  Knowledge  Engineering  and  Ex¬ 
pert  Systems.  The  resultant  expert  systems  technology,  limited  to  academic  laboratories  in  the 
70’s,  is  now  becoming  cost-effective  and  is  beginning  to  enter  into  commercial  applications. 

B.  What  is  an  Expert  System? 

Feigenbaum,  a  pioneer  in  expert  systems,  (1982,  p.l)  states: 

An  “expert  system”  is  an  intelligent  computer  program  that  uses  knowledge  and  inference  procedures  to  solve 
problems  that  are  difficult  enough  to  require  significant  human  expertise  for  their  solution.  The  knowledge 
necessary  to  perform  at  such  a  level,  plus  the  inference  procedures  used,  can  be  thought  of  as  a  model  of  the 
expertise  of  the  best  practitioners  of  the  field. 

The  knowledge  of  an  expert  system  consists  of  facts  and  heuristics.  The  “facts”  constitute  a  body  of  information 
that  is  widely  shared,  publicly  available,  and  generally  agreed  upon  by  experts  in  a  field.  The  “heuristics”  are 
mostly  private,  little-discussed  rules  of  good  judgement  (rules  of  plausible  reasoning,  rules  of  good  guessing)  that 
characterize  expert-level  decision  making  in  the  field.  The  performance  level  of  an  expert  system  is  primarily  a 
function  of  the  size  and  quality  of  the  knowledge  base  that  it  possesses. 

It  has  become  fashionable  today  to  characterize  any  large,  complex  AI  system  that  uses  large 
bodies  of  domain  knowledge  as  an  expert  system.  Thus,  nearly  all  AI  applications  to  real-world 
problems  can  be  considered  in  this  category,  though  the  designation  “knowledge-based  systems” 
is  more  appropriate. 

C.  The  Basic  Structure  of  an  Expert  System 

An  expert  system  consists  of: 

(1)  a  knowledge  base  (or  knowledge  source)  of  domain  facts  and  heuristics  associated  with  the 
problem; 

(2)  an  inference  procedure  (or  control  structure)  for  utilizing  the  knowledge  base  in  the  solu¬ 
tion  of  the  problem; 

(3)  a  working  memory  —  “global  data  base”  —  for  keeping  track  of  the  problem  status,  the  in¬ 
put  data  for  the  particular  problem,  and  the  relevant  history  of  what  has  thus  far  been 
done. 
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A  human  “domain  expert”  usually  collaborates  to  help  develop  the  knowledge  base.  Once  the 
system  has  been  developed,  in  addition  to  solving  problems,  it  can  also  be  used  to  help  instruct 
others  in  developing  their  own  expertise. 

It  is  desirable,  though  not  yet  common,  to  have  a  user-friendly  natural  language  interface  to 
facilitate  the  use  of  the  system  in  all  three  modes:  development,  problem  solving,  instruction.  In 
some  sophisticated  systems,  an  explanation  module  is  also  included,  allowing  the  user  to  chal¬ 
lenge  and  examine  the  reasoning  process  underlying  the  system’s  answers.  Figure  1-1  is  a  diagram 
of  an  idealized  expert  system.  When  the  domain  knowledge  is  stored  as  production  rules,  the 
knowledge  base  is  often  referred  to  as  the  “rule  base,”  and  the  inference  engine  as  the  “rule 
interpreter.” 

An  expert  system  differs  from  more  conventional  computer  programs  in  several  important 
respects.  Duda  (1981,  p.  242)  observes  that,  in  an  expert  system  “.  .  .  there  is  a  clear  separation  of 
general  knowledge  about  the  problem  (the  rules  forming  a  knowledge  base)  from  information 
about  the  current  problem  (the  input  data)  and  the  methods  for  applying  the  general  knowledge 
to  the  problem  (the  rule  interpreter).”  In  a  conventional  computer  program,  knowledge  pertinent 
to  the  problem  and  methods  for  utilizing  this  knowledge  are  all  intermixed,  so  that  it  is  difficult  to 
change  the  program.  In  an  expert  system,  .  .  the  program  itself  is  only  an  interpreter  (or 


USER 


(KNOWLEDGE  SOURCE)  (SYSTEM  STATUS) 


Figure  1-1.  Basic  Structure  of  an  Expert  System. 
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general  reasoning  mechanism)  and  (ideally)  the  system  can  be  changed  by  simply  adding  or 
subtracting  rules  in  the  knowledge  base.” 

D.  The  Knowledge  Base 

The  most  popular  approach  to  representing  the  domain  knowledge  (both  facts  and  heuristics) 
needed  for  an  expert  system  is  by  production  rules  (also  referred  to  as  “SITUATION-ACTION 
rules”  or  “IF-THEN  rules”).*  Thus,  often  a  knowledge  base  is  made  up  mostly  of  rules  which  are 
invoked  by  pattern  matching  with  features  of  the  task  environment  as  they  currently  appear  in  the 
global  data  base. 

E.  The  Control  Structure 

In  an  expert  system  a  problem-solving  paradigm  must  be  chosen  to  organize  and  control  the 
steps  taken  to  solve  the  problem.  A  common,  but  powerful  approach  involves  the  chaining  of  IF- 
THEN  rules  to  form  a  line  of  reasoning.  The  rules  are  actuated  by  patterns  (which,  depending  on 
the  strategy,  match  either  the  IF  or  the  THEN  side  of  the  rules)  in  the  global  data  base.  The  ap¬ 
plication  of  the  rule  changes  the  system  status  and  therefore  the  data  base,  enabling  some  rules 
and  disabling  others.  The  rule  interpreter  uses  a  control  strategy  for  finding  the  enabled  rules  and 
for  deciding  which  of  the  enabled  rules  to  apply.  The  basic  control  strategies  used  may  be  top- 
down  (goal  driven),  bottom-up  (data  driven),  or  a  combination  of  the  two  that  uses  a  relaxation¬ 
like  convergence  process  to  join  these  opposite  lines  of  reasoning  together  at  some  intermediate 
point  to  yield  a  problem  solution.  However,  virtually  all  the  heuristic  search  and  problem  solving 
techniques  that  the  AI  community  has  devised  have  appeared  in  the  various  expert  systems. 

F.  Uses  of  Expert  Systems 

The  uses  of  expert  systems  are  virtually  limitless.  They  can  be  used  to:  diagnose,  repair, 
monitor,  analyse,  interpret,  consult,  plan,  design,  instruct,  explain,  learn,  and  conceptualize. 

G.  Architecture  of  Expert  Systems 

One  way  to  classify  expert  systems  is  by  function  (e.g.  diagnosis,  planning,  etc).  However, 
examination  of  existing  expert  systems  indicates  that  there  is  little  commonality  in  detailed  system 
architecture  that  can  be  detected  from  this  classification.  A  more  fruitful  approach  appears  to  be 
to  look  at  problem  complexity  and  problem  structure  and  deduce  what  data  and  control  struc¬ 
tures  might  be  appropriate  to  handle  these  factors. 

The  Knowledge  Engineering  community  has  evolved  a  number  of  techniques  (presented  in  the 
excellent  tutorial  by  Stefik  et  al.  (1982)  and  summarized  in  Gevarter  (1982))  which  can  be  utilized 
in  devising  suitable  expert  system  architectures. 

The  use  of  these  techniques  in  four  existing  expert  systems  is  illustrated  in  Table  1-1-1  thru 
1-1-4.  Table  1-1-1  thru  1-1-4  outlines  the  basic  approach  taken  by  each  of  these  expert  systems  and 

*Not  all  expert  systems  are  rule-based.  The  network-based  expert  systems  MACSYMA,  INTERNIST/CADUCEUS, 
Digitalis  Therapy  Advisor,  HARPY  and  PROSPECTOR  are  examples  which  are  not.  Buchanan  and  Duda  (1982)  state 
that  the  basic  requirements  in  the  choice  of  an  expert  system  knowledge  representation  scheme  are  extendibility, 
simplicity  and  explicitness.  Thus,  rule-based  systems  are  particularly  attractive. 
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TABLE  1-1-3.  Characteristics  of  Example  Expert  Systems. 
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shows  how  the  approach  translates  into  key  elements  of  the  Knowledge  Base,  Global  Data  Base 
and  Control  Structure.  An  indication  of  the  basic  control  structures  of  the  systems  in  Table  1-1-1 
thru  1-1-4,  and  some  of  the  other  well  known  expert  systems,  is  given  in  Table  1-2. 

Table  1-2  represents  expert  system  control  structures  in  terms  of  the  search  direction,  the  con¬ 
trol  techniques  utilized,  and  the  search  space  transformations  employed.  The  approaches  used  in 
the  various  expert  systems  are  different  implementations  of  two  basic  ideas  for  overcoming  the 
combinatorial  explosion  associated  with  search  in  real  complex  problems.  These  two  ideas  are: 

(1)  Find  ways  to  efficiently  search  a  space, 

(2)  Find  ways  to  transform  a  large  search  space  into  smaller  manageable  chunks  that  can  be 
searched  efficiently. 

It  will  be  observed  from  Table  1-2  that  there  is  little  architectural  commonality  based  either  on 
function  or  domain  of  expertise.  Instead,  expert  system  design  may  best  be  considered  as  an  art 
form,  like  custom  home  architecture,  in  which  the  chosen  design  can  be  implemented  from  the 
collection  of  available  AI  techniques  in  heuristic  search  and  problem  solving. 

In  addition  to  the  techniques  indicated  in  Table  1-2,  also  emerging  are  distributed  knowledge 
and  problem  solving  approaches  exemplified  by  the  MDX  expert  system  (Chandrasekaran,  1983) 
and  the  object-oriented  programming  language,  LOOPS  (Stefik  et  al.,  1983). 

H.  Existing  Expert  Systems 

Table  1-3  is  a  list,  classified  by  function  and  domain  of  use,  of  most  of  the  existing  major  expert 
systems.  It  will  be  observed  that  there  is  a  predominance  of  systems  in  the  Medical  and  Chemistry 
domains  following  from  the  pioneering  efforts  at  Stanford  University.  From  the  list,  it  is  also  ap¬ 
parent  that  Stanford  University  dominates  in  number  of  systems,  followed  by  M.I.T.,  CMU, 
BBN  and  SRI,  with  several  dozen  scattered  efforts  elsewhere. 

The  list  indicates  that  thus  far  the  major  areas  of  expert  systems  development  have  been  in 
diagnosis,  data  analysis  and  interpretation,  planning,  computer-aided  instruction,  analysis,  and 
automatic  programming.  However,  the  list  also  indicates  that  a  number  of  pioneering  expert 
systems  already  exist  in  quite  a  number  of  other  functional  areas.  In  addition,  a  substantial  effort 
is  under  way  to  build  expert  systems  as  tools  for  constructing  expert  systems. 

I.  Constructing  an  Expert  System 

Duda  (1981,  p.  262)  states  that  to  construct  a  successful  expert  system,  the  following  prere¬ 
quisites  must  be  met: 

•  there  must  be  at  least  one  human  expert  acknowledged  to  perform  the  task  well. 

•  the  primary  source  of  the  expert’s  exceptional  performance  must  be  special  knowledge, 
judgment,  and  experience. 

•  the  expert  must  be  able  to  explain  the  special  knowledge  and  experience  and  the  methods 
used  to  apply  them  to  particular  problems. 

•  the  task  must  have  a  well-bounded  domain  of  application. 

Using  present  techniques  and  programming  tools,  the  effort  required  to  develop  an  expert 
system  appears  to  be  converging  towards  five  man-years,  with  most  endeavors  employing  two  to 
five  people  in  the  construction. 
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TABLE  1-2.  Control  Structures  of  Some  Well  Known  Expert  Systems. 
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TABLE  1-3.  Existing  Expert  Systems  by  Function. 
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♦References  to  these  systems  can  be  found  in  Duda  (1981),  Stefik,  et  al.  (1982),  Buchanan  (1981),  Buchanan  and  Duda 
(1982),  Barr  and  Feigenbaum  (1982),  IJCAI-81,  and  AAAI-82. 


TABLE  1-3.  Existing  Expert  Systems  by  Function,  (cont.) 
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Causes  of  Rainfall  WHY  BBN 

Coaching  of  a  Game  WEST  BBN 

Coaching  of  a  Game  WUMPUS  M.I.T 

SCHOLAR  BBN 


TABLE  1-3.  Existing  Expert  Systems  by  Function,  (cont.) 
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Programmer’s  Apprentice 


J.  Summary  of  the  State-of-the-Art 

Buchanan  (1981,  pp.  6-7)  indicates  that  the  current  state  of  the  art  in  expert  systems  is 
characterized  by: 

•  Narrow  domain  of  expertise 

Because  of  the  difficulty  in  building  and  maintaining  a  large  knowledge  base,  the  typical  do¬ 
main  of  expertise  is  narrow.  The  principal  exception  is  INTERNIST,  for  which  the  knowledge 
base  covers  500  disease  diagnoses.  However,  this  broad  coverage  is  achieved  by  using  a  relatively 
shallow  set  of  relationships  between  diseases  and  associated  symptoms.  (INTERNIST  is  now  be¬ 
ing  replaced  by  CADUCEUS,  which  uses  causal  relationships  to  help  diagnose  simultaneous 
unrelated  diseases.) 

•  Limited  knowledge  representation  languages  for  facts  and  relations 

•  Relatively  inflexible  and  stylized  input-output  languages 

•  Stylized  and  limited  explanations  by  the  systems 

•  Laborious  construction 

At  present,  it  requires  a  knowledge  engineer  to  work  with  a  human  expert  to  laboriously  extract 
and  structure  the  information  to  build  the  knowledge  base.  However,  once  the  basic  system  has 
been  built,  in  a  few  cases  it  has  been  possible  to  write  knowledge  acquisition  systems  to  help  ex¬ 
tend  the  knowledge  base  by  direct  interaction  with  a  human  expert,  without  the  aid  of  a 
knowledge  engineer. 

•  Single  expert  as  a  ‘‘knowledge  czar.  ” 

We  are  currently  limited  in  our  ability  to  maintain  consistency  among  overlapping  items  in  the 
knowledge  base.  Therefore,  though  it  is  desirable  for  several  experts  to  contribute,  one  expert 
must  maintain  control  to  insure  the  quality  of  the  data  base. 

•  Fragile  behavior 

In  addition,  most  systems  exhibit  fragile  behavior  at  the  boundaries  of  their  capabilities.  Thus, 
even  some  of  the  best  systems  come  up  with  wrong  answers  for  problems  just  outside  their  do¬ 
main  of  coverage.  Even  within  their  domain,  systems  can  be  misled  by  complex  or  unusual  cases, 
or  for  cases  for  which  they  do  not  yet  have  the  needed  knowledge  or  for  which  even  the  human  ex¬ 
perts  have  difficulty. 

•  Requires  Knowledge  Engineer  to  Operate 

Another  limitation  is  that  for  most  current  systems  only  their  builders  or  other  knowledge 
engineers  can  successfully  operate  them  -  a  friendly  interface  not  having  yet  been  constructed. 
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Nevertheless,  Randy  Davis  (1982)  observes  that  there  have  been  notable  successes.  A 
methodology  has  been  developed  for  explicating  informal  knowledge.  Representing  and  using 
empirical  associations,  five  systems  have  been  routinely  solving  difficult  problems  — DENDRAL, 
MACSYMA,  MOLGEN,  R1  and  PUFF  —  and  are  in  regular  use.  The  first  three  all  have  serious 
users  who  are  only  loosely  coupled  to  the  system  designers.  DENDRAL,  which  analyzes  chemical 
instrument  data  to  determine  the  underlying  molecular  structure,  has  been  the  most  widely  used 
program  (see  Lindsay  et  al.,  1980).  Rl,  which  is  used  to  configure  VAX  computer  systems,  has 
been  reported  to  be  saving  DEC  twenty  million  dollars  per  year,  and  is  now  being  followed  up 
with  XCON.  In  addition,  as  indicated  in  Table  1-3,  dozens  of  systems  have  been  constructed  and 
are  being  experimented  with. 

K.  Future  Trends 

Figure  1-2  lists  some  of  the  expert  systems  applications  currently  under  development. 

It  will  be  observed  that  there  appear  to  be  few  domain  or  functional  limitations  in  the  ultimate 
use  of  expert  systems.  However,  the  nature  of  expert  systems  is  changing.  The  limitations  of  rule- 
based  systems  are  becoming  apparent.  Not  all  knowledge  can  be  readily  structured  in  the  form  of 
empirical  associations.  Empirical  associations  tend  to  hide  causal  relations  (present  only  implicit¬ 
ly  in  such  associations).  Empirical  associations  are  also  inappropriate  for  highlighting  structure 
and  function. 

Thus,  the  newer  expert  systems  are  adding  deep  knowledge  having  to  do  with  causality  and 
structure.  These  systems  will  be  less  fragile,  thereby  holding  the  promise  of  yielding  correct 
answers  often  enough  to  be  considered  for  use  in  autonomous  systems,  not  just  as  intelligent 
assistants. 

The  other  change  is  a  trend  towards  an  increasing  number  of  non-rule  based  systems.  These 
systems,  utilizing  semantic  networks,  frames  and  other  knowledge  representations,  are  often  bet¬ 
ter  suited  for  causal  modeling  and  representing  structure.  They  also  tend  to  simplify  the  reasoning 
required  by  providing  knowledge  representations  more  appropriate  for  the  specific  problem 
domain. 

•  Medical  diagnosis  and  prescription 

•  Medical  knowledge  automation 

•  Chemical  data  interpretation 

•  Chemical  and  biological  synthesis 

•  Mineral  and  oil  exploration 

•  Planning/scheduling 

•  Signal  interpretation 

•  Signal  fusion — situation  interpretation 
from  multiple  sensors 

•  Military  threat  assessment 

•  Tactical  targeting 

•  Space  defense 

Figure  1-2.  Expert  System  Applications  Now  Under  Development. 


•  Air  traffic  control 

•  Circuit  diagnosis 

•  VLSI  design 

•  Equipment  fault  diagnosis 

•  Computer  configuration  selection 

•  Speech  understanding 

•  Intelligent  Computer-Aided  Instruction 

•  Automatic  Programming 

•  Intelligent  knowledge  base  access  and 
management 

•  Tools  for  building  expert  systems 
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Figure  1-3  (based  largely  on  Hayes-Roth  IJCAI-81  Expert  system  tutorial  and  on  Feigenbaum, 
1982)  indicates  some  of  the  future  opportunities  for  expert  systems.  Again  no  limitation  is 
apparent. 

It  thus  appears  that  expert  systems  will  eventually  find  use  in  most  endeavors  which  require 
symbolic  reasoning  with  detailed  professional  knowledge  —  which  includes  much  of  the  world’s 
work.  In  the  process,  there  will  be  exposure  and  refinement  of  the  previously  private  knowledge 
in  the  various  fields  of  applications. 

On  a  more  near-term  scale,  in  the  next  few  years  we  can  expect  to  see  expert  systems  with 
thousands  of  rules.  In  addition  to  the  increasing  number  of  rule-based  systems  we  can  also  expect 
to  see  an  increasing  number  of  non-rule  based  systems.  Also  anticipated  are  much  improved  ex- 


•  Building  and  Construction 

Design,  planning,  scheduling,  control 

•  Equipment 

Design,  monitoring,  control,  diagnosis,  maintenance,  repair,  instruction. 

•  Command  and  Control 

Intelligence  analysis,  planning,  targeting,  communication 

•  Weapon  Systems 

Target  identification,  adaptive  control,  electronic  warfare 

•  Professions 

(Medicine,  law,  accounting,  management,  real  estate,  financial,  engineering) 

Consulting,  instruction,  analysis 

•  Education 

Instruction,  testing,  diagnosis,  concept  formation  and  new  knowledge  development  from 
experience. 

•  Imagery 

Photo  interpretation,  mapping,  geographic  problem-solving. 

•  Software 

Instruction,  specification,  design,  production,  verification,  maintenance 

•  Home  Entertainment  and  Advice-giving 

Intelligent  games,  investment  and  finances,  purchasing,  shopping,  intelligent  information 
retrieval 

•  Intelligent  Agents 

To  assist  in  the  use  of  computer-based  systems 

•  Office  Automation 

Intelligent  systems 

•  Process  Control 

Factory  and  plant  automation 

•  Exploration 

Space,  prospecting,  etc. 

Figure  1-3.  Future  Opportunities  for  Expert  Systems. 
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planation  systems  that  can  explain  (make  “transparent”)  why  an  expert  system  did  what  it  did 
and  what  things  are  of  importance. 

By  the  late  80’s,  we  can  expect  to  see  intelligent,  friendly  and  robust  human  interfaces  and 
much  better  system  building  tools. 

Somewhere  around  the  year  2000,  we  can  expect  to  see  the  beginnings  of  systems  which  semi- 
autonomously  develop  knowledge  bases  from  text.  The  result  of  these  developments  may  very 
well  herald  a  maturing  information  society  where  expert  systems  put  experts  at  everyone’s 
disposal.  In  the  process,  production  and  information  costs  should  greatly  diminish,  opening  up 
major  new  opportunities  for  societal  betterment. 
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II.  COMPUTER  VISION 


A.  Introduction 

Computer  Vision  —  visual  perception  employing  computers  —  shares  with  “Expert  Systems” 
the  role  of  being  one  of  the  most  popular  topics  in  Artificial  Intelligence  today.  The  computer 
vision  field  is  multifaceted,  having  many  participants  with  diverse  viewpoints,  with  many  papers 
having  been  written.  However,  the  field  is  still  in  the  early  stages  of  development  —  organizing 
principles  have  not  yet  fully  crystalized,  and  the  associated  technology  has  not  yet  been  complete¬ 
ly  rationalized.  However,  commercial  vision  systems  have  already  begun  to  be  used  in  manufac¬ 
turing  and  robotic  systems  for  inspection  and  guidance  tasks,  and  other  systems  (at  various  stages 
of  development)  are  beginning  to  be  employed  in  military,  cartographic  and  image  interpretation 
applications. 

B.  Definition 

Computer  (computational  or  machine)  vision  can  be  defined  as  perception  by  a  computer 
based  on  visual  sensory  input.  Barrow  and  Tenenbaum  (1981,  p.  573)  state: 

Vision  is  an  information-processing  task  with  well-defined  input  and  output.  The  input  consists  of  arrays  of 
brightness  values,  representing  projections  of  a  three-dimensional  scene  recorded  by  a  camera  or  comparable 
imaging  device.  Several  input  arrays  may  provide  information  in  several  spectral  bands  (color)  or  from  multiple 
viewpoints  (stereo  or  time  sequence).  The  desired  output  is  a  concise  description  of  the  three-dimensional  scene 
depicted  in  the  image,  the  exact  nature  of  which  depends  upon  the  goals  and  expectations  of  the  observer.  It 
generally  involves  a  description  of  objects  and  their  interrelationships,  but  may  also  include  such  information  as 
the  three-dimensional  structures  of  surfaces,  their  physical  characteristics  (shape,  texture,  color,  material),  and 
the  locations  of  shadows  and  light  sources  .  .  . 

C.  Relation  to  Human  Vision 

MIT’s  Marr  and  Nishihara  (1978,  p.  42)  take  the  view  that  “Artificial  Intelligence  is  (or  ought 
to  be)  the  study  of  information  processing  problems  that  characteristically  have  their  roots  in 
some  aspects  of  biological  information  processing.”  They  developed  a  computational  theory  of 
vision  based  on  their  study  of  human  vision.  Figure  II- 1  represents  the  transition  from  the  raw  im¬ 
age  through  the  primal  sketch  to  the  2-1/2D  sketch  (exemplified  by  Figure  II-2),  which  contains 
information  on  local  surface  orientations,  boundaries,  and  depths. 

The  primal  sketch,  reminiscent  of  an  artist’s  hurried  drawing,  is  a  primitive  but  rich  description 
of  the  way  the  intensities  change  over  the  visual  field.  It  can  be  represented  by  a  set  of  short  line 
segments  separating  regions  of  different  brightnesses.  A  list  of  the  properties  of  the  lines 
segments,  such  as  location,  length,  and  orientation  for  each  segment  can  be  used  to  represent  the 
primal  sketch. 

The  late  Dr.  Marr  and  his  associates’  development  of  a  human  visual  information  processing 
theory  (Marr,  1982)  has  had  a  substantial  impact  on  computational  vision. 

There  are  strong  indications  (see,  e.g.,  Gevarter,  1977)  that  the  interpretative  planning  areas  of 
the  human  brain  set  up  a  context  for  processing  the  input  data.  (This  viewpoint  is  captured  by 
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The  computations  begin  with  representations  of  the  intensities  in  an  image— first  the  image  itself, 
(e.g.,  the  gray-level  intensity  array)  and  then  the  primal  sketch,  a  representation  of  spatial  variations 
in  intensity.  Next  comes  the  operation  of  a  set  of  modules,  each  employing  certain  aspects  of  the 
information  contained  in  the  image  to  derive  information  about  local  orientation,  local  depth,  and 
the  boundaries  of  surfaces.  From  this  is  constructed  the  so-called  2-1/2  dimensional  sketch.  Note 
that  no  "high-level"  information  is  yet  brought  to  bear:  the  computations  proceed  by  utilizing  only 
what  is  available  in  the  image  itself. 

After:  Marr  and  Nishihara,  1978,  p.  42. 


Figure  II- L  A  Framework  for  Early  and  Intermediate  States  in  A  Theory  of  Visual  Information 
Processing. 
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A  candidate  for  the  so-called  2-/4 -dimensional  sketch,  which  encompasses  local  determinations  of 
the  depth  and  orientation  of  surfaces  in  an  image,  as  derived  from  processes  that  operate  upon  the 
primal  sketch  or  some  other  representation  of  changes  in  gray-level  intensity.  The  lengths  of  the 
needles  represent  the  degree  of  tilt  at  various  points  in  the  surface;  the  orientations  of  the  needles 
represent  the  directions  of  tilt. . .  Dotted  lines  show  contours  of  surface  discontinuity.  No  explicit 
representation  of  depth  appears  in  this  figure. 

Source:  Marr  and  Nishihara,  1978,  p.  41. 

Figure  II-2.  An  Example  of  a  2-1 /2D  Sketch. 

Minsky’s  (1975)  AI  “frame”  concept  for  knowledge  representation.)  The  brain  then  uses  visual 
and  other  cues  from  the  environment  to  draw  in  past  knowledge  to  generate  an  internal  represen¬ 
tation  and  interpretation  of  the  scene.  This  knowledge-based  expectation-guided  approach  to 
vision  is  now  appearing  in  advanced  AI  computer  vision  systems. 

D.  Basis  for  a  General  Purpose  Image  Understanding  System 
Barrow  and  Tenenbaum  (1981 ,  p.  573)  observe  that  in  going  from  a  scene  to  an  image  (an  array 
of  brightness  values)  that  the  image  encodes  much  information  about  the  scene,  but  the  informa¬ 
tion  is  confounded  in  the  single  brightness  value  at  each  point.  In  projecting  onto  the  two- 
dimensional  image,  information  about  the  three-dimensional  structure  of  the  scene  is  lost.  In 
order  to  decode  brightness  values  and  recover  a  scene  description,  it  is  necessary  to  employ  a 
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priori  knowledge  embodied  in  models  of  the  scene  domain,  the  illumination,  and  the  imaging 
process. 

As  indicated  by  Figure  II-3,  computer  vision  is  an  active  process  that  uses  these  models  to  inter¬ 
pret  the  sensory  data.  To  accommodate  the  diversity  of  appearance  found  in  real  imagery,  a  high- 
performance,  general-purpose  system  must  embody  a  great  deal  of  knowledge  in  its  models. 


Source:  Barrow  and  Tenenbaum,  1981,  p.  573. 


Figure  II-3.  Model-based  Interpretation  of  Images. 
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E.  Basic  Paradigms  for  Computer  Vision* 

In  broad  terms,  an  image  understanding  system  starts  with  the  array  of  pixel  amplitudes  that 
define  the  computer  image,  and  using  stored  models  (either  specific  or  generic)  determines  the 
content  of  a  scene.  Typically,  various  symbolic  features  such  as  lines  and  areas  are  first  deter¬ 
mined  from  the  image.  These  are  then  compared  with  similar  features  associated  with  stored 
models  to  find  a  match,  when  specific  objects  are  being  sought.  In  more  generic  cases,  it  is 
necessary  to  determine  various  characteristics  of  the  scene,  and  using  generic  models  determine 
from  geometric  shapes  and  other  factors  (such  as  allowable  relationships  between  objects)  the 
nature  of  the  scene  content. 

A  variety  of  paradigms  have  been  proposed  to  accomplish  these  tasks  in  image  understanding 
systems.  These  paradigms  are  based  on  a  common  set  of  broadly  defined  processing  and 
manipulating  elements:  feature  extraction,  symbolic  representation,  and  semantic  interpretation. 
The  paradigms  differ  primarily  in  how  these  elements  (defined  below)  are  organized  and  con¬ 
trolled,  and  the  degree  of  artificial  intelligence  and  knowledge  employed. 

1.  Hierarchical  Bottom-up  Approach 

Figure  IMA  is  a  block  diagram  of  a  hierarchical  paradigm  of  an  image  understanding  system 
that  employs  a  bottom-up  processing  approach.  The  hierarchical  bottom-up  approach  can  be 
developed  successfully  for  domains  with  simple  scenes  made  up  of  only  a  limited  number  of 
previously  known  objects. 

2.  Hierarchical  Top-down  Approach 

This  approach  (usually  called  hypothesize  and  test),  shown  in  Figure  II-4B,  is  goal  directed,  the 
interpretation  stage  being  guided  in  its  analysis  by  trial  or  test  descriptions  of  a  scene.  An  example 
would  be  using  template  matching  —  matched  filtering  —  to  search  for  a  specific  object  or  struc¬ 
ture  within  the  scene.  Matched  filtering  is  normally  performed  at  the  pixel  level  by  cross  correla¬ 
tion  of  an  object  template  with  an  observed  image  field.  It  is  often  computationally  advan¬ 
tageous,  because  of  the  reduced  dimensionality,  to  perform  the  interpretation  at  a  higher  level  in 
the  chain  by  correlating  image  features  or  symbols  rather  than  pixels. 

3.  Heterarchical  Approach 

Hierarchical  image  understanding  systems  are  normally  designed  for  specific  applications. 
They  thus  tend  to  lack  adaptability.  A  large  amount  of  processing  is  also  usually  required.  Pratt 
(1978)  (pp.  572-573)  observes  that  often  much  of  this  processing  is  wasted  in  the  generation  of 
features  and  symbols  not  required  for  the  analysis  of  a  particular  scene.  A  technique  to  avoid  this 
problem  is  to  establish  a  central  monitor  to  observe  the  overall  performance  of  the  image 
understanding  system  and  then  issue  commands  to  the  various  system  elements  to  modify  their 
operation  to  maximize  system  performance  and  efficiency. 

Figure  II-4C  is  a  block  diagram  of  an  image  understanding  system  that  achieves  heterarchical 
operation  by  distributed  feedback  control. 

This  section  is  primarily  based  on  Pratt,  1978,  pp.  570-574. 
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A.  HIERARCHICAL  BOTTOM-UP  APPROACH 


FEATURES  SYMBOLS 


B.  HIERARCHICAL  TOP-DOWN  APPROACH 


I - 1 


FEATURE  CONTROL 


C.  HETERARCHICAL  APPROACH 


D.  BLACKBOARD  APPROACH 


Source:  Pratt,  1978,  pp.  570-574. 

Figure  11-4.  Basic  Image  Understanding  Paradigms. 
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4.  Blackboard  Approach 

Another  image  understanding  system  configuration  called  the  blackboard  model  has  been  pro¬ 
posed  by  Reddy  and  Newell  (1975).  Figure  II-4D  is  a  simplified  representation  of  this  approach  in 
which  the  various  system  elements  communicate  with  each  other  via  a  common  working  data 
storage  called  the  blackboard.  Whenever  any  element  performs  a  task,  its  output  is  put  into  the 
common  data  storage,  which  is  independently  accessible  by  all  other  elements.  The  individual 
elements  can  be  designed  to  act  autonomously  to  further  the  common  system  goal  as  required. 
The  blackboard  system  is  particularly  attractive  in  cases  where  several  hypotheses  must  be  con¬ 
sidered  simultaneously  and  their  components  need  to  be  kept  track  of  at  various  levels  of 
representation. 

F.  Levels  of  Representation 

A  computer  vision  system,  like  human  vision  is,  commonly  considered  to  be  naturally  struc¬ 
tured  as  a  succession  of  levels  of  representation. 

Tenenbaum,  et  al.  (1979,  pp.  254-255),  sketch  in  Figure  II-5,  a  way  in  which  to  view  an 
organization  of  a  general-purpose  vision  system.  They  divide  the  figure  into  two  parts.  The  first  is 


Figure  II-5.  Organization  of  a  Visual  System. 
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image  oriented  (iconic),  domain  independent,  and  based  on  the  image  data  (data  driven).  The 
second  part  of  the  figure  is  symbolic,  dependent  on  the  domain  and  the  particular  goal  of  the 
vision  process. 

The  first  portion  takes  the  image,  which  consists  of  an  intensity  array  of  picture  elements  (“pix¬ 
els,”  e.g.,  1000  x  1000),  and  converts  it  into  image  features  such  as  edges  and  regions.  These  are 
then  converted  into  a  set  of  parallel  “intrinsic  images,”  one  each  for  distance  (range),  surface 
orientation,  reflectance,*  etc. 

The  second  part  of  the  system  segments  these  into  volumes  and  surfaces  dependent  on  our 
knowledge  of  the  domain  and  the  goal  of  the  computation.  Using  domain  knowledge  and  the 
constraints  associated  with  the  relations  among  objects  in  this  domain,  objects  are  identified  and 
the  scene  analyzed  consistent  with  the  system  goal. 

G.  Research  in  Model-Based  Vision  Systems 

Most  research  efforts  in  vision  have  been  directed  at  exploring  various  aspects  of  vision,  or 
toward  generating  particular  processing  modules  for  a  step  in  the  vision  process  rather  than  in 
devising  general  purpose  vision  systems.  However,  there  are  currently  two  major  U.S.  efforts  in 
general  purpose  vision  systems.  The  ACRONYM  system  at  Stanford  University  under  the  leader¬ 
ship  of  T.  Binford,  and  the  VISIONS  system  at  the  University  of  Massachusetts  at  Amherst  under 
A.  Hanson  and  E.  Riseman. 

The  ACRONYM  system,  outlined  in  Table  II- 1-1,  is  designed  to  be  a  general  purpose,  model- 
based  system  that  does  its  major  reasoning  at  the  level  of  volumes  rather  than  images.  The  system 
basically  takes  a  hierarchical  top-down  approach  as  in  Figure  II-4B.  ACRONYM  has  four  essen¬ 
tial  parts:  modeling,  prediction,  description  and  interpretation.  The  user  provides  ACRONYM 
with  models  of  objects  (modeled  in  terms  of  volume  primitives  called  generalized  cones)  and  their 
spatial  relationships;  as  well  as  generic  models  and  their  subclass  relationships.  These  are  both 
stored  in  graph  form.  The  program  automatically  predicts  which  image  features  to  expect. 
Description  is  a  bottom-up  process  that  generates  a  model-independent  description  of  the  image. 
Interpretation  relates  this  description  to  the  prediction  to  produce  a  three-dimensional  under¬ 
standing  of  the  scene. 

The  VISIONS  system  outlined  in  Table  II-1-2,  can  be  considered  to  be  a  working  tool  to  test 
various  image  understanding  modules  and  approaches.  Rather  than  using  specific  models,  its 
high  level  knowledge  is  in  the  form  of  framelike  “schemas”  which  represent  expectations  and  ex¬ 
pected  relationships  in  particular  scene  situations.  VISIONS  is  based  on  monocular  images  and 
does  its  reasoning  at  the  level  of  images  rather  than  volumes. 

Other  research  efforts  in  model-based  vision  systems  are  summarized  in  TABLES  III  in  Appen¬ 
dix  I  of  Gevarter  (1982A).  All  the  research  computer  vision  systems  are  individually  crafted  by  the 
developers  —  reflecting  the  developers’  backgrounds,  interests  and  domain  requirements.  All,  ex¬ 
cept  ACRONYM  (and  to  an  extent,  3-D  Mosaic,  Kanade,  1981),  use  image  (2-D)  models  and  are 
viewpoint  dependent.  Models  are  mostly  described  by  semantic  networks  though  feature  vectors 
are  also  utilized.  The  systems,  capitalizing  on  their  choice  to  limit  their  observations  to  only  a  few 


♦Fraction  of  normal  incident  illumination  reflected. 
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Predicts  appearances  of  models  in  local  rather  than  viewer- 
images  in  terms  of  ribbons  and  centered  primitives, 
ellipses. 
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TABLE  II-1-2.  Model-Based  Vision  Systems. 
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objects,  use  predominantly  the  top-down  interpretation  of  images  approach,  relying  heavily  on 
prediction. 


H.  Industrial  Vision  Systems 

I.  General  Characteristics 

The  prominent  aspect  of  industrial  vision  systems,  in  distinction  to  more  general  vision 
systems,  is  that  they  operate  in  a  relatively  known  and  structured  environment.  In  addition,  the 
situation  (such  as  placement  of  cameras  and  lighting)  can  be  configured  to  simplify  the  computer 
vision  problem.  Usually,  the  number  and  nature  of  possible  objects  will  tend  to  be  restricted,  and 
the  visual  system  will  be  tailored  to  the  function  performed.  Thus  many  of  them  are  based  on  a 
pattern  recognition,  rather  than  an  image  understanding,  approach.  Industrial  vision  systems  are 
characteristically  used  for  such  activities  as  inspection,  manipulation  and  assembly. 

A  popular  organization  for  industrial  computer  vision  is  a  two-stage  hierarchy  with  a  bottom- 
up  control  flow.  The  lower  level  segments  the  image  into  regions  corresponding  to  object  sur¬ 
faces.  The  higher  level  used  this  segmentation  to  identify  objects  from  their  surface  descriptions. 

In  practice,  most  successful  systems  incorporate  aspects  of  both  bottom-up  and  top-down  con¬ 
trol.  The  bottom-up  processing  is  used  to  extract  prominent  features  of  a  part  to  determine  its 
position.  Then,  top-down  control  is  used  to  direct  a  search  to  determine  if  the  part  satisfies  an 
inspection  criterion. 

Industrial  inspection  and  assembly  operations  are  well  suited  to  model-based  analysis,  because 
of  the  well-defined  geometric  descriptions  associated  with  manufactured  items.  CAD/CAM 
technology  allows  the  specification  of  objects  using  either  volumetric  or  surface-based  models. 
These  geometrically  based  models  are  particularly  appropriate  to  the  hypothesis-verify  approach, 
in  which  low-level  image  features  are  extracted  and  matched  to  an  appropriate  computer¬ 
generated  2-D  representation. 

In  addition  to  geometric  models,  objects  may  also  be  represented  by  graphs.  In  this  case, 
recognition  becomes  a  graph-matching  process. 

More  commonly  at  present,  rather  than  using  geometric  models  or  graphs,  industrial  vision 
systems  are  taught  by  being  presented  sample  parts  to  be  recognized  in  each  of  their  expected 
stable  states.  Aspects  of  the  resulting  images  are  typically  stored  as  templates,  and  recognition 
becomes  template  matching.  The  objects  can  also  be  represented  in  terms  of  their  characteristic 
features,  such  as  area,  number  of  holes,  etc.,  and  the  resulting  feature  vector  stored  to  be 
matched  (via  a  search  process)  to  the  corresponding  extracted  feature  vector  of  the  image  during 
system  operation. 

To  simplify  industrial  vision  systems,  the  input  is  usually  reduced  to  a  binary  (black  and  white) 
image,  so  that  objects  appear  as  silhouettes.  Simplicity  is  important  in  industrial  vision  systems 
because  the  computation  time  is  limited,  as  most  systems  are  expected  to  operate  in  near  real  time. 

2.  Examples  of  Efforts  in  Industrial  Visual  Inspection  Systems 

Kruger  and  Thompson  (1981)  discuss  some  example  efforts  of  vision  systems  designed  for  in¬ 
spection.  The  systems  reviewed  are  primarily  for  the  inspection  of  printed  circuit  boards  and  IC 
chips,  with  template  matching  being  the  predominant  inspection  approach. 
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Chin  (1982)  has  recently  published  an  extensive  bibliography  on  automated  visual  inspection 
techniques  and  applications. 

3.  Examples  of  Efforts  in  Industrial  Visual  Recognition  and  Location  Systems 

Table  II-2  (largely  derived  from  Kruger  and  Thompson,  1981)  lists  some  example  efforts  of  vi¬ 
sion  systems  designed  for  industrial  part  recognition  and  location.  All  these  systems  use  a  bottom- 
up  approach.  It  will  be  observed  that  (except  for  Vamos  1979,  and  Albus,  et  al.,  1982)  these 
systems  utilize  template  or  feature  vector  matching.  Vamos  does  work  from  a  3D  wire  frame 
mode  which  utilizes  computer  graphics  type  techniques  to  transform  a  model  projection  into 
alignment  with  observed  lines  in  the  image. 

Albus’  Machine  Vision  Group  in  the  NBS  Industrial  Systems  Division  is  using  simplified  3D 
surface  models  of  machined  parts  to  generate  expectancy  images  from  needed  viewpoints.  The 
group  is  seeking  to  achieve  real-time,  hierarchical,  multi-sensory,  interactive  robot  guidance. 

4.  Commercially  Available  Industrial  Vision  Systems 

Gevarter  (1982A)  surveys  many  of  the  Industrial  Vision  Systems  that  are  currently  commercial¬ 
ly  available.  Most  of  the  systems  require  special  lighting. 

Many  of  the  systems  designed  for  verification  and  inspection  use  pattern  recognition,  rather 
than  AI  techniques.  The  systems  tend  to  be  bottom-up  (see  Figure  II-4A)  because  of  the  speed 
required  to  achieve  real-time  operations.  Often  unique  edge  and  feature  extraction  algorithms  are 
programmed  in  hardware  or  firmware. 

The  more  sophisticated  systems  tend  to  utilize  variations  and  improvements  on  the  SRI  Vision 
Module  described  in  Table  II-2. 

A  few  systems  make  good  use  of  structured  light  for  3D  sensing.  A  number  of  efforts  in  visual 
guidance  of  arc  welding  also  utilize  this  technique. 


I.  Who  Is  Doing  It 

Rosenfeld,  at  the  University  of  Maryland,  issues  a  yearly  bibliography,  arranged  by  subject 
matter,  related  to  the  computer  processing  of  pictorial  information.  The  issue  covering  1981 
(Rosenfeld,  1982)  includes  nearly  1000  references. 

The  following  is  a  list  by  category  of  the  U.S.  “principal  players”  in  computer  vision. 


1.  Research  Oriented 
Universities 

Funded  Under  DARPA  IU  Program 
CMU 
U  of  MD 
MIT 

U.  of  Mass. 

Stanford  U 
U  of  Rochester 

use 

U  of  Rhode  Island 


Other  Active  Universities 

U  of  Texas  at  Austin 

VPI 

Purdue 

U  of  PA 

U  of  IL 

Wayne  State  U 

JHU 

RPI 
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TABLE  II-2.  Example  Research  Efforts  in  Industrial  Visual  Recognition  and  Location  Systems. 
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TABLE  II-2.  Example  Research  Efforts  in  Industrial  Visual  Recognition  and  Location  Systems,  (cont.) 
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TABLE  II-2.  Example  Research  Efforts  in  Industrial  Visual  Recognition  and  Location  Systems.  ( cont.) 
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Lunganan  Acad,  of  Science 


Non-Profits 

SRI  International,  AI  Center 

JPL 

ERIM 

U.S.  Government 

NBS,  Industrial  Systems  Div.,  Gaithersburg,  MD 
NOSC  (Naval  Ocean  Systems  Center),  San  Diego 
NIH  (National  Institutes  of  Health) 

2.  Commercial  Vision  Systems  Developers 

Hundreds  of  companies  are  now  involved  in  vision  systems,  a  partial  listing  being  given  in 
Gevarter  (1982A). 

J.  Summary  of  the  State-of-the-Art 

1.  Human  Vision 

Human  vision  is  the  only  available  example  of  a  general  purpose  vision  system.  However,  thus 
far  not  many  AI  researchers  have  taken  an  interest  in  the  computations  performed  by  natural 
visual  systems,  but  this  situation  is  changing. 

The  MIT  vision  group  (among  others)  believes  that,  to  a  first  approximation,  the  human  visual 
system  is  subdivided  into  modules  specializing  in  visual  tasks.  There  is  also  evidence  that  people 
do  global  processing  first  and  use  it  to  constrain  local  processing. 

Considerable  information  now  exists  about  lower  level  visual  processing  in  humans.  However, 
as  we  progress  up  the  human  visual  computing  hierarchy,  the  exact  nature  of  the  appropriate 
representations  becomes  subject  to  dispute.  Thus,  overall  human  visual  perception  is  still  very  far 
from  being  understood. 

2.  Low  and  Intermediate  Levels  of  Processing 

Though  methods  for  powerful  high-level  understanding  visual  analysis  are  still  in  the  process  of 
being  determined,  insights  into  low-level  vision  are  emerging.  The  basic  physics  of  imaging,  and 
the  nature  of  constraints  in  vision  and  their  use  in  computation  is  fairly  well  understood.  Detailed 
programs  for  vision  modules,  such  as  “shape  from  shading”  and  “optical  flow,”  have  begun  to 
appear.  Also,  the  representational  issues  are  now  better  understood. 

However,  even  for  well  understood  low-level  operations  such  as  edge  detection,  (see,  e.g., 
Ballard,  1982)  there  has  been  no  convergence  among  the  many  techniques  proposed,  and  no 
method  stands  out  as  the  best.  In  general,  edge  detectors  are  still  unreliable,  though  Marr  and 
Hilbert’s  approach,  based  on  the  zero  crossing  of  the  second  derivative  of  the  intensity  gradient, 
appears  promising. 

In  industrial  vision,  the  primary  technique  for  achieving  robust  edge  finding  and  segmentation 
is  to  use  special  lighting  and  convert  to  a  silhouette  binary  image  in  which  edges  and  regions  are 
readily  distinguishable. 
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At  intermediate  levels,  edge  classification  and  labelling  have  been  very  successfully  used  in  the 
blocks  world. 

Binford  (1982)  in  reviewing  existing  research  in  model-based  vision  systems  observed  that  most 
systems  first  segment  regions  then  describe  their  shape.  None  of  the  systems  makes  effective  use 
of  texture  for  segmentation  and  description.  In  general,  shape  description  is  primitive  and  inter¬ 
pretation  systems  have  not  yet  made  full  use  of  even  these  limited  capabilities. 

As  yet,  the  extraction  of  useful  information  from  color  is  extremely  rudimentary.  The  percep¬ 
tual  use  of  motion  (optical  flow)  has  been  a  focus  of  attention  recently,  but  findings  are 
preliminary. 

For  low  level  processing,  many  recent  algorithms  take  the  form  of  parallel  computations  in¬ 
volving  local  interactions.  One  popular  approach  having  this  character  is  “relaxation,”  in  which 
local  computations  are  iteratively  propagated  to  try  to  extract  global  features.  These  locally 
parallel  architectures  are  well  suited  to  rapid  parallel  processing  techniques  using  special  purpose 
VLSI  chips. 

3.  Industrial  Vision  Systems 

Barrow  and  Tenenbaum  (1981,  p.  572)  observe  that: 

Significant  progress  has  been  made  in  recent  years  on  practical  applications  of  machine  vision.  Systems  have  been 

developed  that  achieve  useful  levels  of  performance  on  complex  real  imagery  in  tasks  such  as  inspection  of  in¬ 
dustrial  parts,  interpretation  of  aerial  imagery,  and  analysis  of  chest  x-rays.  Virtually  all  such  systems  are  special 

purpose,  being  heavily  dependent  on  domain-specific  constraints  and  techniques. 

It  has  been  estimated  that  as  of  mid-1982,  though  less  than  50  sophisticated  industrial  vision 
systems  were  actually  in  use  in  the  U.S.,  approximately  1000  simple  line-scan  inspection  systems 
were  in  regular  operation.  Though  special  purpose  systems  have  thus  far  been  the  most  effective, 
successful  vision  applications  are  now  becoming  commonplace  and  are  expanding.  Vision 
manufacturers  are  now  beginning  to  provide  easier  user  programming,  friendlier  user  interfaces, 
and  systems  engineering  support  to  prospective  users.  Many  firms  are  now  entering  the  industrial 
vision  field,  with  technical  leap-frogging  being  common  due  to  rapidly  changing  technology. 

4.  General  Purpose  Vision  Systems 

Though  many  practical  image  recognition  systems  have  been  developed,  Hiatt  (1981,  pp.  2,  8) 
observes  that,  “In  current  vision  applications,  the  type  of  scene  to  be  processed  and  acted  upon  is 
usually  carefully  defined  and  limited  to  the  capability  of  the  machine  .  .  .  General  purpose  com¬ 
puter  vision  has  not  yet  been  solved  in  practice.”  This  domain  specificity  makes  each  new  applica¬ 
tion  expensive  and  time  consuming  to  develop. 

Binford  (1982)  in  reviewing  current  model-based  research  vision  systems  concludes  that  most 
systems  have  not  attempted  to  be  general  vision  systems,  though  ACRONYM  does  demonstrate 
some  progress  toward  this  goal.  Existing  vision  systems  performances  are  strongly  limited  by  the 
performance  of  their  segmentation  modules,  their  weak  use  of  world  knowledge  and  weak 
descriptions,  making  little  use  of  shape. 

With  the  exception  of  ACRONYM  (and  to  an  extent  3-D  Mosaic),  the  systems  surveyed  depend 
on  image  models  and  relations,  and  therefore  are  strongly  viewpoint-dependent.  To  generalize  to 
viewpoint-insensitive  interpretations  would  require  three-dimensional  modeling  and  interpreta¬ 
tion  as  in  ACRONYM. 
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Binford  concludes  that  though  the  results  of  these  and  other  efforts  are  encouraging  as  first 
demonstrations,  nevertheless  as  general  vision  systems,  they  have  a  long  way  to  go. 

K.  Applications  and  Future  Trends 

Brady  (1981,  p.  2)  states  that,  “There  is  currently  a  surge  of  interest  in  image  understanding  on 
the  part  of  industry.”  Examples  of  current  computer  vision  applications  are  indicated  in 
Figure  II-6. 

As  the  field  of  computer  vision  unfolds,  we  expect  to  see  the  following  future  trends.* 

1.  Techniques 

•  Though  most  industrial  vision  systems  have  used  binary  representations,  we  can  expect  in¬ 
creased  use  of  gray  scales  because  of  their  potential  for  handling  scenes  with  cluttered 
backgrounds  and  uncontrolled  lighting. 

•  Recent  theoretical  work  on  monocular  shape  interpretation  from  images  (shape  from 
shading,  texture,  etc.)  make  it  appear  promising  that  general  mechanisms  for  generating 
spatial  observations  from  images  will  be  available  within  the  next  2  to  5  years  to  support 
general  vision  systems. 

•  Successful  techniques  (such  as  stereo  and  motion  parallax)  for  deriving  shape  and/or  motion 
from  multiple  images  should  also  be  available  within  2  to  5  years. 

•  The  mathematics  of  Image  Understanding  will  continue  to  become  more  sophisticated. 

•  Enlargement  will  continue  of  the  links  now  growing  between  Image  Understanding  and 
Theories  of  Human  Vision. 

2.  Hardware  and  Architecture 

•  We  are  now  seeing  hardware  and  software  emerging  that  enables  real-time  operation  in  sim¬ 
ple  situations.  Within  the  next  2  to  5  years  we  should  see  hardware  and  software  that  will 
enable  similar  real-time  operation  for  robotics  and  other  activities  requiring  recognition,  and 
position  and  orientation  information. 

•  Fast  raster-based  pipeline  preprocessing  hardware  to  compute  low-level  features  in  local 
regions  of  an  entire  scene  are  now  becoming  available  and  should  find  general  use  in  com¬ 
mercial  vision  systems  in  2  to  4  years. 

•  As  at  virtually  all  visual  levels,  processing  seems  inherently  parallel,  parallel  processing  is  a 
wave  of  the  future  (but  not  the  entire  answer). 

•  Relaxation  and  constraint  analysis  techniques  are  on  the  increase  and  will  be  increasingly 
reflected  in  future  architectures. 

3.  AI  and  General  Vision  Systems 

Computer  vision  will  be  a  key  factor  in  achieving  many  artificial  intelligence  applications.  The 
goal  is  to  move  from  special-purpose  visual  processing  to  general-purpose  computer  vision.  Work 
to  date  in  model-based  systems  has  made  a  tentative  beginning.  But  the  long-run  goal  is  to  be  able 


♦These  trends  have  been  largely  derived  from  statements  by  Brady  (1981  A,  1981B),  Binford  (1982),  Kruger  and  Thomp¬ 
son  (1981),  Agin  (1980),  Arden  (1980),  Rosenfeld  (1981),  Hiatt  (1981),  and  Barrow  and  Tenenbaum  (1981). 
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AUTOMATION  OF  INDUSTRIAL  PROCESSES 


Object  acquisition  by  robot  arms,  for  example,  for  sorting  or  packing  items  arriving  on  con¬ 
veyor  belts. 

Automatic  guidance  of  seam  welders  and  cutting  tools. 

VLSI-related  processes,  such  as  lead  bonding,  chip  alignment  and  packaging. 

Monitoring,  filtering,  and  thereby  containing  the  flood  of  data  from  oil  drill  sites  or  from 
seismographs. 

Providing  visual  feedback  for  automatic  assembly  and  repair. 

INSPECTION  TASKS 

The  inspection  of  printed  circuit  boards  for  spurs,  shorts,  and  bad  connections. 

Checking  the  results  of  casting  processes  for  impurities  and  fractures. 

Screening  medical  images  such  as  chromosome  slides,  cancer  smears,  x-ray  and  ultrasound 
images,  tomography. 

Routine  screening  of  plant  samples. 

Inspection  of  alpha-numerics  on  labels  and  manufactured  items. 

Checking  packaging  and  contents  in  pharmaceutical  and  food  industries. 

Inspection  of  glass  items  for  cracks,  bubbles,  etc. 

REMOTE  SENSING 

Cartography,  the  automatic  generation  of  hill-shaded  maps,  and  the  registration  of  satellite 
images  with  terrain  maps. 

Monitoring  traffic  along  roads,  docks,  and  at  airfields. 

Management  of  land  resources  such  as  water,  forestry,  soil  erosion,  and  crop  growth. 
Detecting  mineral  ore  deposits. 

MAKING  COMPUTER  POWER  MORE  ACCESSIBLE 

Management  information  systems  that  have  a  communication  channel  considerably  wider  than 
current  systems  that  are  addressed  by  typing  or  pointing. 

Document  readers  (for  those  who  still  use  paper). 

Design  aids  for  architects  and  mechanical  engineers. 

MILITARY  APPLICATIONS 

Tracking  moving  objects. 

Automatic  navigation  based  on  passive  sensing. 

Target  acquisition  and  range  finding. 

AIDS  FOR  THE  PARTIALLY  SIGHTED 

Systems  that  read  a  document  and  speak  what  they  read. 

Automatic  “guide  dog”  navigation  systems. 

Figure  II-6.  Examples  of  Applications  of  Computer  Vision  Now  Underway. 
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to  deal  with  unfamiliar  or  unexpected  input.*  Reasoning  in  terms  of  generic  models  and  reason¬ 
ing  by  analogy  are  two  approaches  being  pursued.  However,  it  is  anticipated  that  it  will  be  a 
decade  or  more  before  substantial  progress  will  be  made. 

4.  Modeling  and  Programming 

•  Now  emerging  is  3D  modeling,  arising  largely  from  CAD/CAM  technology.  3D  CAD/CAM 
data  bases  will  be  integrated  with  industrial  vision  systems  to  realistically  generate  synthe¬ 
sized  images  for  matching  with  visual  inputs. 

•  Illumination  models,  shading  and  surface  property  models  will  be  increasingly  incorporated 
into  visual  systems. 

•  Volumetric  models  which  allow  prediction  and  interpretation  at  the  levels  of  volumes,  rather 
than  images,  will  see  greater  utilization. 

•  High  level  vision  programming  languages  (such  as  Automatix’s  RAIL)  that  can  be  integrated 
with  robot  and  industrial  manufacturing  languages  are  now  beginning  to  appear  and  will 
become  commonplace  within  5  years. 

•  Generic  representations  for  amorphous  objects  (such  as  trees)  have  been  experimentally 
utilized  and  should  become  generally  available  within  5  years. 

5.  Knowledge  Acquisition 

•  Strategies  for  indexing  into  a  large  database  of  models  should  be  available  within  the  next  2 
to  5  years. 

•  “Training  by  being  told”  will  supplement  “training  by  example”  as  computer  graphics 
techniques  and  vision  programming  languages  become  more  common. 

6.  Sensing 

•  An  important  area  of  development  is  3D  sensing.  Several  current  industrial  vision  systems 
are  already  employing  structured  light  for  3D  sensing.  A  number  of  new  innovative  tech¬ 
niques  in  this  area  are  expected  to  appear  in  the  next  5  years. 

•  More  active  vision  sensors  such  as  lidar  are  now  being  explored,  but  are  unlikely  to  find 
substantial  industrial  application  until  the  last  half  of  this  decade. 

7.  Industrial  Vision  Systems 

•  We  will  see  increased  use  of  advanced  vision  techniques  in  industrial  vision  systems, 
including  gray  scale  imagery. 

•  We  are  now  observing  a  shortening  time  lag  between  research  advances  and  their  applica¬ 
tions  in  industry.  It  is  anticipated  that  in  the  future  this  lag  may  be  as  little  as  one  to  two 
years. 

•  Advanced  electronics  hardware  at  reduced  cost  is  increasing  the  capabilities  and  speed  of  in¬ 
dustrial  vision,  while  simultaneously  reducing  costs. 

*As  computer  vision  systems  move  toward  this  goal,  they  will  increasingly  incorporate  Expert  System  components 
using  multiple  knowledge  sources.  Gevarter  (1982B)  provides  An  Overview  of  Expert  Systems,  in  which  ACRONYM 
and  VISIONS  are  considered  to  be  examples  of  Expert  Systems. 
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•  It  is  anticipated  that  special  lighting  and  active  sensing  will  play  an  increasing  role  in 
industrial  vision. 

•  Common  programming  languages  and  improved  interface  standards  will  within  the  next  3  to 
10  years  enable  easier  integration  of  vision  to  robots  and  into  the  industrial  environment. 

8.  Future  Applications 

•  It  is  anticipated  that  about  one  quarter  of  all  industrial  robots  will  be  equipped  with  some 
form  of  vision  system  by  1990. 

•  It  is  likely  that  in  the  order  of  90%  of  all  industrial  inspection  activities  requiring  vision  will 
be  done  with  computer  vision  systems  within  the  next  decade. 

•  New  vision  system  applications  in  a  wide  variety  of  areas,  as  yet  unexplored,  will  begin  to  ap¬ 
pear  within  this  decade.  An  example  of  such  a  system  might  be  visual  traffic  monitors  at  in¬ 
tersections  that  could  perceive  cars,  pedestrians,  etc.,  in  motion,  and  control  the  flow  of 
traffic  accordingly. 

•  Computer  vision  will  play  a  large  role  in  future  military  applications.  The  Defense  Mapping 
Agency  intends  to  achieve  fully  automated  production  for  mapping,  charting  and  geodesy 
by  1995,  utilizing  “expert  system”-guided  computer  vision  facilities. 

L.  Conclusion 

In  conclusion,  the  amount  of  activity  and  the  many  researchers  in  the  computer  vision  field 
suggest  that  within  the  next  5  to  10  years,  we  should  see  some  startling  advances  in  practical  com¬ 
puter  vision,  though  the  availability  of  practical  general  vision  systems  still  remains  a  long  way 
off. 
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III.  NATURAL  LANGUAGE  PROCESSING  (NLP)* 


A.  Introduction 

One  major  goal  of  Artificial  Intelligence  (AI)  research  has  been  to  develop  the  means  to  in¬ 
teract  with  machines  in  natural  language  (in  contrast  to  a  computer  language).  The  interaction 
may  be  typed,  printed  or  spoken.  The  complementary  goal  has  been  to  understand  how  humans 
communicate.  The  scientific  endeavor  aimed  at  achieving  these  goals  has  been  referred  to  as  com¬ 
putational  linguistics  (or  more  broadly  as  cognitive  science),  an  effort  at  the  intersection  of  AI, 
linguistics,  philosophy  and  psychology. 

Human  communication  in  natural  language  is  an  activity  of  the  whole  intellect.  AI  researchers, 
in  trying  to  formalize  what  is  required  to  properly  address  natural  language,  find  themselves  in¬ 
volved  in  the  long  term  endeavor  of  having  to  come  to  grips  with  this  whole  activity.  (Formal  lin¬ 
guists  tend  to  restrict  themselves  to  the  structure  of  language.)  The  current  AI  approach  is  to  con¬ 
ceptualize  language  as  a  knowledge-based  system  for  processing  communications  and  to  create 
computer  programs  to  model  that  process. 

Communication  acts  can  serve  many  purposes,  depending  on  the  goals,  intentions  and  strate¬ 
gies  of  the  communicator.  One  goal  of  the  communication  is  to  change  some  aspect  of  the 
recipient’s  mental  state.  Thus,  communication  endeavors  to  add  or  modify  knowledge,  change  a 
mood,  elicit  a  response  or  establish  a  new  goal  for  the  recipients. 

For  a  computer  program  to  interpret  a  relatively  unrestricted  natural  language  communication, 
a  great  deal  of  knowledge  is  required.  Knowledge  is  needed  of: 

—  the  structure  of  sentences 

—  the  meaning  of  words 

—  the  morphology  of  words 

—  a  model  of  the  beliefs  of  the  sender 

—  the  rules  of  conversation,  and 

—  an  extensive  shared  body  of  general  information  about  the  world. 

This  body  of  knowledge  can  enable  a  computer  (like  a  human)  to  use  expectation-driven 
processing  in  which  knowledge  about  the  usual  properties  of  known  objects,  concepts,  and  what 
typically  happens  in  situations,  can  be  used  to  understand  incomplete  or  ungrammatical  sentences 
in  appropriate  contexts. 

B.  Applications 

There  are  many  applications  for  computer-based  natural  language  understanding  systems. 
Some  of  these  are  listed  in  Table  III-l. 


*A  more  complete  treatment  of  NLP  is  given  in  Gevarter  (1983). 
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TABLE  III-l.  Some  Applications  of  Natural  Language  Processing. 


Discourse 

Speech  Understanding 
Story  Understanding 

Information  Access 

Information  Retrieval 
Question  Answering  Systems 
Computer-Aided  Instruction 

Information  Acquisition  or  Transformation 

Machine  Translation 
Document  or  Text  Understanding 
Automatic  Paraphrasing 
Knowledge  Compilation 
Knowledge  Acquisition 


Interaction  with  Intelligent  Programs 

Expert  Systems  Interfaces 
Decision  Support  Systems 
Explanation  Modules  for  Computer  Actions 
Interactive  Interfaces  to  Computer  Programs 

Interacting  with  Machines 

Control  of  Complex  Machines 

Language  Generation 

Document  or  Text  Generation 
Speech  Output 

Writing  Aids:  e.g.,  grammar  checking 


C.  Approach 

Natural  Language  Processing  (NLP)  systems  utilize  both  linguistic  knowledge  and  domain 
knowledge  to  interpret  the  input.  As  domain  knowledge  (knowledge  about  the  subject  area  of 
communication)  is  so  important  to  understanding,  it  is  usual  to  classify  the  various  systems  based 
on  their  representation  and  utilization  of  domain  knowledge.  On  this  basis,  Hendrix  and 
Sacerdoti  (1981)  classify  systems  as  Types  A,  B,  or  C,*  with  Type  A  being  the  simplest,  least 
capable  and  correspondingly  least  costly  systems. 

1.  Type  A:  No  World  Models 

a.  Key  Words  or  Patterns 

The  simplest  systems  utilize  ad  hoc  data  structures  to  store  facts  about  a  limited  domain.  Input 
sentences  are  scanned  by  the  programs  for  predeclared  key  words,  or  patterns,  that  indicate 
known  objects  or  relationships. 

b.  Limited  Logic  Systems 

In  limited  logic  systems,  information  in  their  data  base  was  stored  in  some  formal  notation,  and 
language  mechanisms  were  utilized  to  translate  the  input  into  the  internal  form.  The  internal  form 
chosen  was  such  as  to  facilitate  performing  logical  inferences  on  information  in  the  data  base. 

*Other  system  classifications  are  possible,  e.g.,  those  based  on  the  range  of  syntactic  coverage. 
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2.  Type  B:  Systems  That  Use  Explicit  World  Models 

In  these  systems,  knowledge  about  the  domain  is  explicitly  encoded,  usually  in  frame  or  net¬ 
work  representations  (discussed  in  a  later  section)  that  allow  the  system  to  understand  input  in 
terms  of  context  and  expectations.  Cullingford’s  work  (see  Schank  and  Ableson,  1977)  on  SAM 
(Script  Applier  Mechanism)  is  a  good  example  of  this  approach. 

3.  Type  C:  Systems  that  Include  Information  about  the  Goals  and  Beliefs  of  Intelligent  Entities. 
These  advanced  systems  (still  in  the  research  stage)  attempt  to  include  in  their  knowledge  base 

information  about  the  beliefs  and  intentions  of  the  participants  in  the  communication.  If  the  goal 
of  the  communication  is  known,  it  is  much  easier  to  interpret  the  message.  Schank  and  Abelson’s 
(1977)  work  on  plans  and  themes  reflects  this  approach. 

D.  The  Parsing  Problem 

For  more  complex  systems  than  those  based  on  key  words  and  pattern  matching,  language 
knowledge  is  required  to  interpret  the  sentences.  The  system  usually  begins  by  “parsing”  the  in¬ 
put  (processing  an  input  sentence  to  produce  a  more  useful  representation  for  further  analysis). 
This  representation  is  normally  a  structural  description  of  the  sentence  indicating  the  relationship 
of  the  component  aparts.  To  address  the  parsing  problem  and  to  interpret  the  result,  the  com¬ 
putational  linguistic  community  has  studied  syntax,  semantics,  and  pragmatics.  Syntax  is  the 
study  of  the  structure  of  phrases  and  sentences.  Semantics  is  the  study  of  meaning.  Pragmatics  is 
the  study  of  the  use  of  language  in  context. 

E.  Grammars 

Barr  and  Feigenbaum  (1981,  p.  229)  state,  “A  grammar  of  a  language  is  a  scheme  for  specify¬ 
ing  the  sentences  allowed  in  the  language,  indicating  the  syntactic  rules  for  combining  words  into 
well -formed  phrases  and  clauses.”  The  following  grammars  are  some  of  the  most  important.* 

1.  Phrase  Structure  Grammar  —  Context  Free  Grammar 
Chomsky  (see,  e.g.,  Winograd,  1983)  had  a  major  impact  on  linguistic  research  by  devising  a 
mathematical  approach  to  language.  He  defined  a  series  of  grammars  based  on  rules  for  rewriting 
sentences  into  their  component  parts.  He  designated  these  as  0,  1,  2,  or  3,  based  on  the  restric¬ 
tions  associated  with  the  rewrite  rules,  with  3  being  the  most  restrictive. 

Type  2  —  Context-Free  (CF)  or  Phrase  Structure  Grammar  (PSG)  —  has  been  one  of  the  most 
useful  in  natural -language  processing.  It  has  the  advantage  that  all  sentence  structure  derivations 
can  be  represented  as  a  tree  and  practical  parsing  algorithms  exist.  Though  it  is  a  relatively  natural 
grammar,  it  is  unable  to  capture  all  the  sentence  constructions  found  in  most  natural  languages 
such  as  English.  Gazder  (1981)  has  recently  broadened  the  applicability  of  CF  PSG  by  adding 
augmentations  to  handle  situations  that  do  not  fit  the  basic  grammar.  This  generalized  Phrase 
Structure  Grammar  is  now  being  developed  by  Hewlett  Packard  (Gawron  et  al.,  1982). 


*Charniak  and  Wilks  (1976)  provide  a  good  overview  of  the  various  approaches. 
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2.  Transformational  Grammar 

Tennant  (1981,  p.  89)  observes  that  “The  goal  of  a  language  analysis  program  is  recognizing 
grammatical  sentences  and  representing  them  in  a  canonical  structure  (the  underlying  structure).” 
A  transformational  grammar  (Chomsky,  1957)  consists  of  a  dictionary,  a  phrase  structure  gram¬ 
mar  and  a  set  of  transformations.  In  analyzing  sentences,  using  a  phrase  structure  grammar,  first 
a  parse  tree  is  produced.  This  is  called  the  surface  structure.  The  transformational  rules  are  then 
applied  to  the  parse  tree  to  transform  it  into  a  canonical  form  called  the  deep  (or  underlying) 
structure.  As  the  same  thing  can  be  stated  in  several  different  ways,  there  may  be  many  surface 
structures  that  translate  into  a  single  deep  structure. 

3.  Case  Grammar 

Case  Grammar  is  a  form  of  Transformational  Grammar  in  which  the  deep  structure  is  based  on 
cases  -  semantically  relevant  syntactic  relationships.  The  central  idea  is  that  the  deep  structure  of  a 
simple  sentence  consists  of  a  verb  and  one  or  more  noun  phrases  associated  with  the  verb  in  a  par¬ 
ticular  relationship.  These  semantically  relevant  relationships  are  called  cases.  Fillmore  (1971) 
proposed  the  following  cases:  Agent,  Experiencer,  Instrument,  Object,  Source,  Goal,  Location, 
Type  and  Path. 

The  cases  for  each  verb  form  an  ordered  set  referred  to  as  a  “case  frame.”  A  case  frame  for  the 
verb  “open”  would  be: 

(object  (instrument)  (agent)) 

which  indicates  that  open  always  has  an  object,  but  the  instrument  or  agent  can  be  omitted  as  in¬ 
dicated  by  their  surrounding  parentheses.  Thus  the  case  frame  associated  with  the  verb  provides  a 
template  which  aids  in  understanding  a  sentence. 

4.  Semantic  Grammars 

For  practical  systems  in  limited  domains,  it  is  often  more  useful,  instead  of  using  conventional 
syntactic  constituents  such  as  noun  phrases,  verb  phrases  and  prepositions,  to  use  meaningful 
semantic  components  instead.  Thus,  in  place  of  nouns  when  dealing  with  a  naval  data  base,  one 
might  use  ships,  captains,  ports  and  cargos.  This  approach  gives  direct  access  to  the  semantics  of 
a  sentence  and  substantially  simplifies  and  shortens  the  processing.  Grammars  based  on  this 
approach  are  referred  to  as  semantic  grammars  (see,  e.g.,  Burton,  1976). 

5.  Other  Grammars 

A  variety  of  other,  but  less  prominent,  grammars  have  been  devised.  Still  others  can  be  ex¬ 
pected  to  be  devised  in  the  future.  One  example  is  Montague  Grammar  (Dowty  et  al.,  1981)  which 
uses  a  logical  functional  representation  for  the  grammar  and  therefore  is  well  suited  for  the 
parallel-processing  logical  approach  now  being  pursued  by  the  Japanese  (see  Nishida  and 
Doshita,  1982)  for  their  future  AI  work  as  embodied  is  their  Fifth  Generation  Computer  research 
project. 
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F.  Semantics  and  the  Cantankerous  Aspects  of  Language 

Semantic  processing  (as  it  tries  to  interpret  phrases  and  sentences)  attaches  meanings  to  the 
words.  Unfortunately,  English  does  not  make  this  as  simple  as  looking  up  the  word  in  the  dic¬ 
tionary,  but  provides  many  difficulties  which  require  context  and  other  knowledge  to  resolve. 
Examples  are: 

1.  Multiple  Word  Senses 

Syntactic  analysis  can  resolve  whether  a  word  is  used  as  a  noun  or  a  verb,  but  further  analysis  is 
required  to  select  the  sense  (meaning)  of  the  noun  or  verb  that  is  actually  used.  For  example, 
“fly”  used  as  a  noun  may  be  a  winged  insect,  a  fancy  fishhook,  a  baseball  hit  high  in  the  air,  or 
several  other  interpretations  as  well.  The  appropriate  sense  can  be  determined  by  context  (e.g., 
for  “fly”  the  appropriate  domain  of  interest  could  be  extermination,  fishing  or  sports),  or  by 
matching  each  noun  sense  with  the  senses  of  other  words  in  the  sentence.  This  latter  approach  was 
taken  by  Reiger  and  Small  (1979)  using  the  (still  embryonic)  technique  of  “interacting  word  ex¬ 
perts,”  and  by  Finin  (1980)  and  McDonald  (1982)  as  the  basis  for  understanding  noun 
compounds. 

2.  Pronouns 

Pronouns  allow  a  simplified  reference  to  previously  used  (or  implied)  nouns,  sets  or  events. 
Where  feasible,  using  pragmatics,  pronoun  antecedents  are  usually  identified  by  reference  to  the 
most  recent  noun  phrase  having  the  same  context  as  the  pronoun. 

3.  Ellipsis  and  Substitution 

Ellipsis  is  the  phenomenon  of  not  stating  explicitly  some  words  in  a  sentence,  but  leaving  it  to 
the  reader  or  listener  to  fill  them  in.  Substitution  is  similar  —  using  a  dummy  word  in  place  of  the 
omitted  words.  Employing  pragmatics,  ellipses  and  substitutions  are  usually  resolved  by  matching 
the  incomplete  statement  to  the  structures  of  previous  recent  sentences  —  finding  the  best  partial 
match  and  then  filling  in  the  rest  from  this  matching  previous  structure. 

G.  Knowledge  Representation* 

As  the  AI  approach  to  natural  language  processing  is  heavily  knowledge  based,  it  is  not  surpris¬ 
ing  that  a  variety  of  knowledge  representation  (KR)  techniques  have  found  their  way  into  the 
field.  Some  of  the  more  important  ones  are: 

1.  Procedural  Representations  —  The  meanings  of  words  or  sentences  being  expressed  as 
computer  programs  that  reason  about  their  meaning. 


♦More  complete  presentations  on  KR  can  be  found  in  Chapter  III  of  Barr  and  Feigenbaum  (1981),  and  in  Part  C  of  this 
volume. 
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2.  Declarative  Representations 

a.  Logic  —  Representation  in  First  Order  Predicate  Logic,  for  example. 

b.  Semantic  Networks  —  Representations  of  concepts  and  relationships  between  concepts  as 
graph  structures  consisting  of  nodes  and  labeled  connecting  arcs. 

3.  Case  Frames  —  (covered  earlier) 

4.  Conceptual  Dependency  —  This  approach  (related  to  case  frames)  is  an  attempt  to  provide  a 
representation  of  all  actions  in  terms  of  a  small  number  of  semantic  primitives  into  which  input 
sentences  are  mapped  (see,  e.g.,  Schank  and  Riesbeck,  1981).  The  system  relies  on  11  primitive 
physical,  instrumental  and  mental  ACT’s  (propel,  grasp,  speak,  attend,  P  trans,  A  trans,  etc.), 
plus  several  other  categories  or  concept  types. 

5.  Frame  —  A  complex  data  structure  for  representing  a  whole  situation,  complex  object  or 
series  of  events.  A  frame  has  slots  for  objects  and  relations  appropriate  to  the  situation. 

6.  Scripts  —  Frame-like  data  structures  for  representing  stereotyped  sequences  of  events  to  aid  in 
understanding  simple  stories. 

H.  Syntactic  Parsing 

Parsing  assigns  structures  to  sentences.  The  following  types  have  been  developed  over  the  years 
for  NLP.  (Barr  and  Feigenbaum,  1981). 

I.  Template  Matching:  Most  of  the  early  (and  some  current)  NL  programs  performed  parsing  by 
matching  their  input  sentences  against  a  series  of  stored  templates. 

2.  Transition  Nets: 

Phrase  structure  grammars  can  be  syntactically  decomposed  using  a  set  of  rewrite  rules  such  as 
indicated  in  Figure  III-l.  Observe  that  a  simple  sentence  can  be  rewritten  as  a  Noun  Phrase  and  a 
Verb  Phrase  as  indicated  by: 

S— ►NP  VP 

The  noun  phrase  can  be  rewritten  by  the  rule 

NP— ►  (DET)(ADJ*)N(PP*) 

where  the  parentheses  indicate  that  the  item  is  optional,  while  the  asterisks  (associated  with  the 
adjectives  and  prepositional  phrases)  indicate  that  any  number  of  items  may  occur. 

An  example  of  an  analyzed  noun  phrase  is  shown  in  Figures  III-2  and  III-3. 

As  the  transition  networks  analyze  a  sentence,  they  can  collect  information  about  the  word  pat¬ 
terns  they  recognize  and  fill  slots  in  a  frame  associated  with  each  pattern.  Thus,  they  can  identify 
noun  phrases  as  singular  or  plural,  whether  the  nouns  refer  to  persons  and  if  so  their  gender, 
etc.,  needed  to  produce  a  deep  structure.  A  simple  approach  to  collecting  this  information  is  to 
attach  subroutines  to  be  called  for  each  transition.  A  transition  network  with  such  subroutines  at¬ 
tached  is  called  an  “augmented  transition  network,”  or  ATN.  With  ATN’s,  word  patterns  can  be 
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GRAMMAR 

S  - ►  NP  VP 

NP  - ►  (DET)  (ADJ*)  N  (PP*) 

PP  - ►  PREP  NP 

VP  - ►  VTRAN  NP 

Figure  III-l.  A  Transition  Network  for  a  Small  Subset  of  English. 

Each  diagram  represents  a  rule  for  finding  the  corresponding  word  pattern.  Each  rule  can  call  on 
other  rules  to  find  needed  patterns. 


After  Graham  (1979,  p214 .) 


NP 

'  ,M  "  ^  '  “N 

The  payload  on  a  tether  under  the  shuttle 

DET  N  PP 

The  payload  on  a  tether  under  the  shuttle 

PREP  NP 

on  a  tether  under  the  shuttle 

DET  N  PP 

—  . ^  —  — 

a  tether  under  the  shuttle 

PREP  NP 

- ^ 

under  the  shuttle 

DET  N 
the  shuttle 

Figure  III-2.  Example  Noun  Phrase  Decomposition. 


NP 


Figure  III-3.  Parse  Tree  Representation  of  the  Noun  Phrase  Surface  Structure. 
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recognized.  For  each  word  pattern,  we  can  fill  slots  in  a  frame.  The  resulting  filled  frames  provide 
a  basis  for  further  processing. 

3.  Other  Parsers 

Other  parsing  approaches  have  been  devised,  but  ATN’s  remain  the  most  popular  syntactic 
parsers.  ATN’s  are  top-down  parsers  in  that  the  parsing  is  directed  by  an  anticipated  sentence 
structure.  An  alternative  approach  is  bottom-up  parsing,  which  examines  the  input  words  along 
the  string  from  left  to  right,  building  up  all  possible  structures  to  the  left  of  the  current  word 
as  the  parser  advances.  A  bottom-up  parser  could  thus  build  many  partial  sentence  structures  that 
are  never  used,  but  the  diversity  could  be  an  advantage  in  trying  to  interpret  input  word  strings 
that  are  not  clearly  delineated  sentences  or  contain  ungrammatical  constructions  or  unknown 
words.  There  have  been  recent  attempts  to  combine  the  top-down  with  the  bottom-up  approach 
for  NLP  in  a  similar  manner  as  has  been  done  for  Computer  Vision. 

For  a  recent  overview  of  parsing  approaches  see  Slocum  (1981). 

I.  Semantics,  Parsing  and  Understanding 

The  role  of  syntactic  parsing  is  to  construct  a  parse  tree  or  similar  structure  of  the  sentence  to 
indicate  the  grammatical  use  of  the  words  and  how  they  are  related  to  each  other.  The  role  of  the 
semantic  processing  is  to  establish  the  meaning  of  the  sentence.  This  requires  facing  up  to  all  the 
cantankerous  ambiguities  discussed  earlier. 

Charniak  (1981)  observes  that  there  have  been  two  main  lines  of  attack  on  word  sense  ambi¬ 
guity.  One  is  the  use  of  discrimination  nets  (Reiger  and  Small,  1979)  that  utilize  the  syntactic 
parse  tree  (by  observing  the  grammatical  role  that  the  word  plays,  such  as  taking  a  direct  object, 
etc.)  in  helping  to  decide  the  word  sense.  The  other  approach  is  based  on  the  frame/script  idea 
(used,  e.g.,  for  story  comprehension)  that  provides  a  context  and  the  expected  sense  of  the  word 
(see  e.g.,  Schank  and  Abelson,  1977). 

Charniak  indicates  that  the  semantics  at  the  level  of  the  word  sense  is  not  the  end  of  the  parsing 
process,  but  what  is  desired  is  understanding  or  comprehension  (associated  with  pragmatics). 
Here  the  use  of  frames,  scripts  and  more  advanced  topics  such  as  plans,  goals,  and  knowledge 
structures  (see,  e.g.  Schank  and  Riesbeck,  1981)  play  an  important  role. 

J.  Natural  Language  Processing  (NLP)  Systems 

As  indicated  below,  various  NLP  systems  have  been  developed  for  a  variety  of  functions. 

1.  Kinds 

a.  Question  Answering  Systems 

Question  answering  natural  language  systems  have  perhaps  been  the  most  popular  of  the  NLP 
research  systems.  They  have  the  advantage  that  they  usually  utilize  a  data-base  for  a  limited 
domain  and  that  most  of  the  user  discourse  is  limited  to  questions. 

b.  Natural  Language  Interfaces  (NLI’s) 

These  systems  are  designed  to  provide  a  painless  means  of  communicating  questions  or  instruc¬ 
tions  to  a  complex  computer  program. 
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c.  Computer-Aided  Instruction  (CAI) 

Arden  (1980,  p.  465)  states: 

One  type  of  interaction  that  calls  for  ability  in  natural  languages  is  the  interaction  needed  for  effective  teaching 
machines.  Advocates  of  computer-aided  instruction  have  embraced  numerous  schemes  for  putting  the  computer 
to  use  directly  in  the  educational  process.  It  has  long  been  recognized  that  the  ultimate  effectiveness  of  teaching 
machines  is  linked  to  the  amount  of  intelligence  embodied  in  the  programs.  That  is,  a  more  intelligent  program 
would  be  better  able  to  formulate  the  questions  and  presentations  that  are  most  appropriate  at  a  given  point  in  a 
teaching  dialogue,  and  it  would  be  better  equipped  to  understand  a  student’s  response,  even  to  analyze  and 
model  the  knowledge  of  the  student,  in  order  to  tailor  the  teaching  to  his  needs. 

d.  Discourse 

Systems  that  are  designed  to  understand  discourse  (extended  dialogue)  usually  employ 
pragmatics.  Pragmatic  analysis  requires  a  model  of  the  mutual  beliefs  and  knowledge  held  by  the 
speaker  and  listener. 

e.  Text  Understanding 

Though  Schank  (see  Schank  and  Riesbeck,  1981)  and  others  have  addressed  themselves  to  this 
problem,  much  more  remains  to  be  done.  Techniques  for  understanding  printed  text  include 
scripts  and  causative  approaches. 

/.  Text  Generation 

There  are  two  major  aspects  of  text  generation:  one  is  the  determination  of  the  content  and  tex¬ 
tual  shape  of  the  message,  the  second  is  transforming  it  into  natural  language.  There  are  two  ap¬ 
proaches  for  accomplishing  this.  The  first  is  indexing  into  canned  text  and  combining  it  as  ap¬ 
propriate.  The  second  is  generating  the  text  from  basic  considerations.  McDonald’s  thesis  (1980) 
provides  one  of  the  most  sophisticated  approaches  to  text  generation. 

2.  Research  NLP  Systems 

Until  recently,  virtually  all  of  the  NLP  systems  generated  were  of  a  research  nature.  These  NLP 
systems  basically  were  aimed  at  serving  five  functions: 

a.  Interfaces  to  Computer  Programs 

b.  Data  Base  Retrieval 

c.  Text  Understanding 

d.  Text  Generation 

e.  Machine  Translation 

Gevarter  (1983)  includes  a  survey  of  research  NLP  systems. 

3.  Commercial  Systems: 

The  commercial  systems  available  today  (together  with  their  approximate  price)  are  listed  in 
Table  III-2.  Several  of  these  systems  are  derivatives  of  past  research  NLP  systems. 
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TABLE  111-2.  Some  Commercial  Natural  Language  Systems. 


System  Organization 

INTELLECT  Artificial  Intelligence  Corp. 

(Derivative  of  ROBOT)  Waltham,  Mass 

$50K/System 
(also  distributed  as 

ON-LINE  ENGLISH  (Culliane) 

and  GRS  Executive)  (Information  Sciences) 


PEARL  (Based  on  Cognitive  Systems 

SAM  and  PAM)  New  Haven,  Conn 

$250K/system 


Straight  Talk  Dictaphone, 

(Derivative  of  LIFER)  Written  by  Symantec 

$660  Sunnyvale,  CA 


SAVVY  SAVVY  Marketing  Inter- 

$950  national 

Sunnyvale,  CA 


Purpose  Comments 

NLI  for  Data  Base  •  Several  hundred  systems  sold 

Retrieval 

•  Takes  about  2  weeks  to 

(Other  extensions  implement  for  a  new  data 

underway)  base. 

•  Written  in  PL-1 

•  Available  for  mainframes 

Custom  NLI’s  •  Large  start-up  cost  in  build¬ 

ing  the  knowledge  base. 

The  first  system — 

Explorer— is  an  inter-  •  Several  systems  have  been, 

face  to  an  existing  and  are  being,  built, 

map  generating  system. 

Others  are  interfaces  •  Written  in  LISP 
to  data  bases. 

Highly  portable  NLI  •  Written  in  PASCAL. 

for  DBMS  for  micro-  Designed  to  be  very  compact 

computers.  and  efficient.  Available 

about  Nov.  1983. 

•  User  customized. 

System  Interface  •  Not  linguistic.  Uses  adaptive 

for  micro-computers  (best  fit)  pattern  matching 

to  strings  of  characters. 

•  Released  3/82 

•  User  customized 


Weidner  System 

$16K/language 

direction 

ALPS 


Weidner  Communications  Semi-Automatic  •  Linguistic  approach.  Written  in 

Corp.  Provo,  UT  Natural  Language  FORTRAN  IV. 

Translation. 

•  Translation  with  human  editing 
is  approximately  100  words /hr 
(up  to  eight  times  as  fast  as 
human  alone). 

•  Approx.  20  sold  by  end  of  1982, 
mainly  to  large  multi-national 
corporations. 


ALPS 
Provo,  UT 


Interactive  Natural  •  Linguistic  Approach 

Language  Translation 

•  Uses  a  dictionary  that  provides 
the  various  translations  for 
technical  words  as  a  display  to 
human  translator,  who  then 
selects  among  the  displayed 
words. 
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TABLE  III-2.  Some  Commercial  Natural  Language  Systems  (cont.) 


System 

Organization 

Purpose 

Comments 

NLMENU 

Texas  Instruments,  Inc. 

NLI  to  Relational 

•  Menu  Driven  NL  Query  System 

Dallas,  TX 

Data  Bases 

•  All  queries  constructed  from  menu 

fall  within  linguistic  and  conceptual 
coverage  of  the  system.  Therefore, 
all  queries  entered  are  successful. 

•  Grammars  used  are  semantic 
grammars  written  in  a  context-free 
grammar  formalism. 

•  Producing  an  interface  to  any 
arbitrary  set  of  relations  is  automated 
and  only  requires  a  15-30  minute 
interaction  with  someone  knowledge¬ 
able  about  the  relations  in  question. 

•  System  will  be  available  late  in 
1983  as  a  software  package  for  a 
micro-computer. 


K.  State  of  the  Art 

It  is  now  feasible  to  use  computers  to  deal  with  natural  language  input  in  highly  restricted  con¬ 
texts.  However,  interacting  with  people  in  a  facile  manner  is  still  far  off,  requiring  understanding 
of  where  people  are  coming  from  —  their  knowledge,  goals  and  moods. 

In  today’s  computing  environment,  the  only  systems  that  perform  robustly  and  efficiently  are 
Type  A  systems  —  those  that  do  not  use  explicit  world  models,  but  depend  on  key  word  or  pat¬ 
tern  matching  and/or  semantic  grammars.  In  actual  working  systems,  both  understanding  and 
text  generation,  ATN-like  grammars  can  be  considered  the  state  of  the  art. 


L.  Principal  U.S.  Participants  in  NLP 

1.  Research  and  Development* 
Non-Profit 

SRI 

MITRE 


*A  review  of  current  research  in  NLP  is  given  in  Kaplan  (1982). 
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Universities 


Yale  U.  —  Dept,  of  Computer  Science 

U.  of  CA,  Berkeley  —  Computer  Science  Div.,  Dept,  of  EECS. 
Carnegie-Mellon  U.  —  Dept,  of  Computer  Science 
U.  of  Illinois,  Urbana  —  Coordinated  Science  Lab. 

Brown  U.  —  Dept,  of  Computer  Science 
Stanford  U.  —  Computer  Science  Dept. 

U.  of  Rochester  —  Computer  Science  Dept. 

U.  of  Mass.,  Amherst  —  Department  of  Computer  and  Information  Science 
SUNY,  Stoneybrook,  Dept,  of  Computer  Science 
U.  of  CA,  Irvine,  Computer  Science  Dept. 

U  of  PA  —  Dept,  of  Computer  and  Infor.  Science 

GA  Institute  of  Technology  —  School  of  Infor.  and  Computer  Science 

USC  —  Infor.  Science  Institute. 

MIT  —  AI  Lab. 

NYU  —  Computer  Science  Dept,  and  Linguistic  String  Project 
U.  of  Texas  at  Austin  —  Dept,  of  Computer  Science 
Cal.  Inst,  of  Tech. 

Brigham  Young  U.  —  Linguistics  Dept. 

Duke  U.  —  Dept,  of  Computer  Science 
N.  Carolina  State  —  Dept,  of  Computer  Science 
Oregon  State  U.  —  Dept,  of  Computer  Science 
Purdue  U. 

Industrial 

BBN 

TRW  Defense  Systems 
IBM,  Yorktown  Heights,  N.Y. 

Burroughs 
Sperry  Univac 

Systems  Development  Corp.,  Santa  Monica 

Hewlett  Packard 

Martin  Marietta,  Denver 

Texas  Instruments,  Dallas 

Xerox  PARC 

Bell  Labs 

Institute  of  Scientific  Information,  Phila.,  PA 
GM  Research  labs,  Warren,  MI 
Honeywell 
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2.  Principal  U.S.  Government  Agencies  Funding  NLP  Research 
ONR  (Office  of  Naval  Research) 

NSF  (National  Science  Foundation) 

DARPA  (Defense  Advanced  Research  Projects  Agency) 

3.  Commercial  NLP  Systems 

Artificial  Intelligence  Corp.,  Waltham,  Mass. 

Cognitive  Sytems  Inc.,  New  Haven,  Conn. 

Symantec,  Sunnyvale,  CA 
Texas  Instruments,  Dallas,  TX 
Weidner  Communications,  Inc.,  Provo,  Utah 
Savvy  Marketing  International,  Sunnyvale,  CA 
ALPS,  Provo,  UT 

4.  Non-U. S. 

U.  of  Manchester,  England 
Kyoto  U.,  Japan 
Siemens,  Corp.,  Germany 
U.  of  Strathclyde,  Scotland 

Centre  National  de  la  Recherche  Scientifique,  Paris 

U.  di  Udine,  Italy 

U.  of  Cambridge,  England 

Phillips  Res.  Labs,  The  Netherlands 

M.  Forecast 

Commercial  natural  language  interfaces  (NLI’s)  to  computer  programs  and  data  base  manage¬ 
ment  systems  are  now  becoming  available.  The  imminent  advent  of  NLI’s  for  micro-computers  is 
the  precursor  for  eventually  making  it  possible  for  virtually  anyone  to  have  direct  access  to 
powerful  computational  systems. 

As  the  cost  of  computing  has  continued  to  fall,  but  the  cost  of  programming  hasn’t,  it  has 
already  become  cheaper  in  some  applications  to  create  NLI  systems  (that  utilize  subsets  of 
English)  than  to  train  people  in  formal  programming  languages. 

Computational  linguists  and  workers  in  related  fields  are  devoting  considerable  attention  to  the 
problems  of  NLP  systems  that  understand  the  goals  and  beliefs  of  the  individual  communicators. 
Though  progress  has  been  made,  and  feasibility  has  been  demonstrated,  more  than  a  decade  will 
be  required  before  useful  systems  with  these  capabilities  will  become  available. 

One  of  the  problems,  in  implementing  new  installations  of  NLP  systems,  is  gathering  informa¬ 
tion  about  the  applicable  vocabulary  and  the  logical  structure  of  the  associated  data  bases.  Work 
is  now  underway  to  develop  tools  to  help  automate  this  task.  Such  tools  should  be  available 
within  5  years. 

For  text  understanding,  experimental  programs  have  been  developed  that  “skim”  stylized  text 
such  as  short  disaster  stories  in  newspapers  (DeJong,  1982).  Despite  the  practical  problems  of  suf- 
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ficient  world  knowledge  and  the  extension  of  language  required,  practical  tools  emerging  from 
these  efforts  should  be  available  to  provide  assistance  to  humans  doing  text  understanding  within 
this  decade. 

The  NRL  Computational  Linguistic  Workshop  (1981)  concluded  that  text  generation  tech¬ 
niques  are  maturing  rapidly  and  new  application  possibilities  will  appear  within  the  next  five 
years. 

The  NRL  workshop  also  indicated  that: 

Machine  aids  for  human  translators  appear  to  have  a  brighter  prospect  for  immediate  application  than  fully 
automatic  translation;  however,  the  Canadian  French-English  weather  bulletin  project  is  a  fully  automatic 
system  in  which  only  20<Vo  of  the  translated  sentences  require  minor  rewording  before  public  release.  An  am¬ 
bitious  common  market  project  involving  machine  translation  among  six  European  languages  is  scheduled  to 
begin  shortly.  Sixty  people  will  be  involved  in  that  undertaking  which  will  be  one  of  the  largest  projects  under¬ 
taken  in  computational  linguistics.*  The  panel  was  divided  in  its  forecast  on  the  five  year  perspective  of  machine 
translation  but  the  majority  were  very  optimistic. 

Nippon  Telegram  and  Telephone  Corp.  in  Tokyo  has  a  machine  translation  AI  project  under¬ 
way.  An  experimental  system  for  translating  from  Japanese  to  English  and  vice  versa  is  now  being 
demonstrated.  In  addition,  the  recently  initiated  Japanese  Fifth  Generation  Computer  effort  has 
computer-based  natural  language  understanding  as  one  of  its  major  goals. 

In  summary,  natural  language  interfaces  using  a  limited  subset  of  English  are  now  becoming 
available.  Hundreds  of  specialized  systems  are  already  in  operation.  Major  efforts  in  text 
understanding  and  machine  translation  are  underway,  and  useful  (though  limited)  systems  will  be 
available  within  the  next  five  years.  Systems  that  are  heavily  knowledge-based  and  handle  more 
complete  sets  of  English  should  be  available  within  this  decade.  However ,  systems  that  can  handle 
unrestricted  natural  discourse  and  understand  the  motivation  of  the  communicators  remain  a  dis¬ 
tant  goal,  probably  requiring  more  than  a  decade  before  useful  systems  appear. 
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IV.  SPEECH  RECOGNITION  AND  SPEECH  UNDERSTANDING 


A.  Introduction 

Speech  is  our  fastest  means  of  discourse  communication,  being  about  twice  as  fast  as  the 
average  typist.  It  is  also  nearly  effortless:  speech  doesn’t  need  visual  or  physical  contact  and  it 
places  few  restrictions  on  the  use  of  the  hands  or  the  mobility  of  the  body.  Speech  is  thus  well 
suited  to  communication  with  a  machine  when  the  individual  is  engaged  in  other  activities.  Its  ef¬ 
fortlessness  also  makes  it  desirable  for  operating  a  computer,  and  it  is  a  long  term  candidate  for 
direct  text  preparation  (automatic  dictation). 

Speech  understanding  systems  have  all  the  difficulties  of  natural  language  understanding  plus 
the  problem  of  interpreting  the  speech  signal  with  all  its  noise  and  variability.  As  a  result,  speech 
understanding  is  one  of  the  most  difficult  AI  subjects,  being  a  perception  task  related  to  the  scene 
understanding  problem  in  computer  vision.  Though  the  constraining  aspects  of  natural  language 
help  reduce  the  magnitude  of  the  task,  it  remains  a  major  problem  area. 

Speech  systems  can  be  categorized  into  speech  recognition  systems  and  speech  understanding 
systems,  the  former  task  being  considerably  easier.  In  addition,  the  systems  further  divide  into 
those  that  work  with  isolated  words  and  those  that  can  handle  connected  speech,  the  latter  being 
perhaps  an  order  of  magnitude  more  difficult  than  the  former. 

Finally,  speech  systems  are  also  classified  as  speaker  dependent  and  speaker  independent.  The 
former  systems  must  be  trained  to  recognize  the  particular  speakers  using  it. 

The  heart  of  the  speech  problem  (that  gives  rise  to  the  above  classifications)  is  the  difficulty  of 
recognizing  the  speech  signal,  but  before  we  explore  that  area,  let  us  briefly  look  at  applications 
for  speech  devices. 

B.  Applications 

There  are  many  applications  emerging  for  speech  recognition  and  speech  understanding 
systems.  Some  of  these  are  listed  in  Tables  IV- 1  and  IV-2. 

C.  The  Nature  of  Speech  Sounds: 

It  is  beginning  to  be  realized  that  acoustics  and  phonetics  may  be  the  key  to  speech  under¬ 
standing.  Zue  (1981)  argues  that  human  spectrograph-reading  experiments  indicate  that  phonetic 
recognition  in  speech  systems  can  be  improved  substantially,  which  would  result  in  much  more 
capable  speech  systems. 

Speech  recognition  is  based  primarily  on  the  identification  of  words.  An  adult  speaker  may 
know  100,000  of  the  300,000  words  in  the  English  language.  Each  language  has  a  basic  set  of 
speech  sounds  called  phonemes.  In  English  there  are  only  about  40  phonemes,  compared  with 
some  10,000  for  the  next  largest  speech  unit,  the  syllable. 

The  sounds  that  make  up  human  speech  are  generated  by  the  flow  of  air  through  the  vocal  tract 
in  three  ways  (Levinson  and  Liberman,  1981): 
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TABLE  IV-1.  Speech  Recognition  Applications. 


Manufacturing  Processes  and  Control 

•  Quality  control  data  entry  into  computers 

•  Shipping  and  receiving  —  record  entry,  package  sorting 

•  Maintenance  and  repair  orders  —  part 
availability,  work  needed  or  under  way. 

•  CAD/CAM 

Office  Automation 

•  Executive  work  station 

•  Word  processing 

•  Data  entry 

•  Control  functions 

Technical  Data  Gathering 

•  Cartography  —  inputs  when  working  with  maps. 

•  Working  with  blueprints 

•  Medical  applications: 

Dental  records 
Pathology 

Services  for  the  handicapped 

Operating  room  logging 

Command/control  of  medical  instrumentation 

Security  Applications 

•  Building  access 

•  Computer  file  access 

•  Communications  security 

•  Speaker  verification/identification 

Consumer  Products  Applications 

•  Control  functions 

•  Status  queries 

Equipment  Subsystem  Operation 

•  Aircraft 

•  Spacecraft 

•  Military  equipment 
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TABLE  IV-2.  Speech  Understanding  Applications 


•  Universal  access  to  large  data  bases  via  the  telephone  network. 

•  Automatic  telephone  transaction  systems  —  Airline  reservations  and  inquiries. 

•  Command  and  Control 

—  Military 

—  Business  ;; 

•  Operation  of  complex  machines. 


1 .  The  vocal  cords  can  be  made  to  vibrate,  resulting  in  the  frequency  of  the  sound  referred  to 
as  pitch. 

2.  A  constriction  can  be  formed  in  the  vocal  tract,  narrow  enough  to  cause  turbulence, 
resulting  in  noise-like  sounds,  like  that  used  to  produce  “f”. 

3.  Pressure  built  up  behind  a  closure  (such  as  the  lips)  can  release  a  burst  of  acoustic  energy  as 
in  the  pronunciation  of  consonants,  such  as  “p”,  “t”  and  “k”. 

These  three  sources  of  speech  sound  are  shaped  acoustically  by  the  time-varying  physical  shape 
of  the  vocal  tract. 

One  way  to  characterize  the  speech  signal  is  by  its  Fourier  transform,  which  specifies  the 
amplitude  and  phase  of  each  of  the  frequencies  in  the  Fourier  series  of  the  signal.  As  the  phase 
makes  little  perceptual  difference,  the  signal  is  represented  in  practice  by  its  amplitude  spectrum, 
in  a  representation  called  a  spectograph. 

D.  Isolated  Word  Recognition 

Figure  IV- 1  indicates  a  basic  paradigm  for  speech  recognition.  The  signal  is  first  operated  upon 
to  emphasize  the  2  to  3  kHz  frequency  range,  filtered  to  chop  off  high  frequencies  (>8  kHz), 
then  digitized.  The  end  points  of  the  word  are  detected,  and  a  set  of  parameters  representing  the 
word  are  generated.  This  is  then  matched  with  stored  parameter  sets  in  the  system’s  vocabulary, 
and  the  word  with  the  closest  match  chosen.  For  a  word,  the  acoustic  signal  varies  both  in  dura¬ 
tion  and  amplitude  each  time  the  same  speaker  says  it.  Thus  it  may  have  to  be  warped  to  achieve 
the  best  comparison  with  the  reference  —  this  task  being  one  of  the  toughest  problems  for  a 
speech  recognizer.  The  warping  is  usually  accomplished  by  dynamic  programming. 

Doddington  and  Schalk  (1981,  p.  28)  state  that: 

The  most  common  means  of  feature  extraction  is  direct  measurement  of  spectrum  amplitude,  with,  for  example, 
a  set  of  16  bandpass  filters.  Another  means  is  measurement  of  the  zero-crossing  rate  of  the  signal  in  several  broad 
frequency  bands  to  give  an  estimate  of  the  formant  [resonant]  frequencies  in  these  bands.  Yet  another  means  is 
representing  the  speech  signal  in  terms  of  the  parameters  of  a  filter  whose  spectrum  best  fits  that  of  the  input 
speech  signal.  This  technique  known  as  linear  predictive  coding  (LPC)  has  gained  popularity  because  it  is 
efficient,  accurate,  and  simple. 


59 


Q 


W  < 


60 


E.  Recognizing  Continuous  Speech 

For  continuous  speech,  rather  than  attempting  to  match  all  possible  word  patterns,  it  is  often 
more  efficient  to  work  with  speech  units  much  smaller  than  words,  particularly  phonemes.  Break¬ 
ing  down  the  speech  signal  into  these  smaller  components  and  giving  them  symbols,  is  referred  to 
as  segmentation  and  labeling.  Usually,  several  phoneme  labels  are  assigned  to  each  segment  by  a 
pattern-matching  process,  which  also  assigns  a  probability  value  representing  the  goodness  of  the 
match.  With  the  appropriate  acoustic-phonetic  knowledge,  it  is  possible  to  combine,  regroup, 
and  delete  segments  to  form  larger  phoneme  units.  The  lexical  knowledge  of  word  pronunciation 
can  now  be  used  to  generate  a  multiplicity  of  word  hypotheses.  For  a  sufficiently  limited 
vocabulary,  and  perhaps  also  employing  some  syntactic  and  word  boundary  knowledge,  speech 
recognition  can  be  achieved. 

F.  Speech  Understanding 

Arden  (1980,  pp.  475,  478)  observes  that: 

Speech-understanding  systems  differ  somewhat  from  recognition  systems,  in  that  they  have  access  to  and  make 
effective  use  of  task-specific  knowledge  in  the  analysis  and  interpretation  of  speech.  Further,  the  criteria  for  per¬ 
formance  are  somewhat  relaxed,  in  that  the  errors  that  count  are  not  the  errors  in  speech  recognition,  but  errors 
in  task  accomplishment. 

To  successfully  decode  the  unknown  utterance,  a  speech  perception  system  must  effectively  use  the  many  diverse 
sources  of  knowledge  about  the  language,  the  environment,  and  the  context.  These  sources  of  knowledge  include 
the  characteristics  of  speech  sounds  (acoustic-phonetic),  variability  in  pronunciation  (phonology),  the  stress  and 
intonation  patterns  of  speech  (prosodies),  the  sound  patterns  of  words  and  sentences  (lexicon),  the  grammatical 
structure  of  language  (syntax),  the  meaning  of  words  and  sentences  (semantics),  and  the  context  of  the  conversa¬ 
tion  (pragmatics)  .  .  . 

What  makes  speech  perception  a  challenging  and  difficult  area  of  A. I.  is  the  fact  that  error  and  ambiguity 
permeate  all  the  levels  of  the  speech-decoding  process  .... 

The  grammatical  structure  of  sentences  can  be  viewed  principally  as  a  mechanism  for  reducing  search  by  restrict¬ 
ing  the  number  of  acceptable  alternatives  .... 

Barr  and  Feigenbaum  (1981,  p.  332)  note  that  the  types  of  knowledge  at  the  various  levels  in 
processing  spoken  knowledge  include  (from  the  signal  level  up): 

1.  Phonetics  —  representations  of  the  physical  characteristics  of  the  sounds  in  all  of  the  words  in  the  vocabulary. 

2.  Phonemics  —  rules  describing  variations  in  pronunciation  that  appear  when  words  are  spoken  together  in 
sentences  (coarticulation  across  word  boundaries,  “swallowing”  of  syllables,  etc.); 

3.  Morphemics  —  rules  describing  how  morphemes  (units  of  meaning)  are  combined  to  form  words  (formation 
of  plurals,  conjugations  of  verbs,  etc.); 

4.  Prosodies  —  rules  describing  fluctuation  in  stress  and  intonation  across  a  sentence; 

5.  Syntax  —  the  grammar  or  rules  of  sentence  formation  resulting  in  important  constraints  on  the  number  of 
sentences  (not  all  combinations  of  words  in  the  vocabulary  are  legal  sentences); 

6.  Semantics  —  the  “meaning”  of  words  and  sentences,  which  can  also  be  viewed  as  a  constraint  on  the  speech 
understander  (not  all  grammatically  legal  sentences  have  a  meaning  —  e.g.,  The  snow  was  loud);  and,  finally, 

7.  Pragmatics  —  rules  of  conversation  (in  a  dialogue,  a  speaker’s  response  must  not  only  be  a  meaningful 
sentence  but  also  be  a  reasonable  reply  to  what  was  said  to  him).  For  instance,  it  is  pragmatic  knowledge  that 
tells  us  that  the  question  “Can  you  tell  me  what  time  it  is?”  requires  more  than  just  a  Yes  or  No  response. 

Using  this  knowledge,  the  hierarchical  structure  leading  to  speech  understanding  can  be 
characterized  as  shown  in  Figure  IV-2. 
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Figure  IV-2.  The  Processing  Hierarchy  in  Speech  Understanding 
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G.  The  ARPA  Speech  Understanding  Research  (SUR)  Project 

/.  Introduction 

In  1971,  ARPA  (The  Advanced  Research  Projects  Agency)  initiated  a  five  year  speech 
understanding  research  effort  that  proved  to  be  one  of  the  most  significant  projects  in  AI  history. 
Not  only  did  it  greatly  advance  our  knowledge  of  speech,  but  it  also  provided  new  insights  on  how 
to  structure  and  control  a  complex  “expert  system.” 

Lea  and  Shoup  (1979)  reported  that  the  ARPA  SUR  project  had  the  highly  ambitious  goals  of 
understanding,  with  90%  accuracy,  continuous  speech  from  a  1000  word  vocabulary  spoken  by 
several  cooperative  speakers  under  near  ideal  conditions  of  quiet  rooms  and  high-fidelity  equip¬ 
ment.  It  was  intended  that  the  processing  take  no  more  than  several  times  real-time  using  large 
very  fast  computers. 

There  were  three  principal  complete  systems  developed  under  the  project  —  HEARSAY  II  and 
HARPY  at  Carnegie  Mellon  University  (CMU),  and  HWIM  (Hear  What  I  Mean)  at  Bolt, 
Berenek  and  Newman  (BBN).  In  1976,  the  ARPA  goals  were  essentially  met  at  CMU  by  HARPY 
exhibiting  a  95%  accuracy  and  HEARSAY  II  achieving  a  90%  accuracy.  HWIM  had  a  substan¬ 
tially  lower  accuracy,  but  utilized  a  more  difficult  vocabulary.  (HWIM’s  domain  was  Travel 
Budget  Management.  HEARSAY  II’s  and  HARPY’s  was  Retrieval  of  AI  Documents.)  These 
three  systems  were  heavily  knowledge-based  and  are  now  considered  to  be  expert  systems. 

All  the  ARPA  SUR  systems  utilized  a  combination  of  bottom-up  and  top-down  processing. 
The  lower  levels  used  knowledge  about  the  variable  phonetic  composition  of  the  words  in  the 
vocabulary  (lexicon)  to  interpret  pieces  of  the  speech  signal  by  comparing  it  with  prestored  pat¬ 
terns.  The  top  level  aided  in  recognition  by  building  expectations  about  which  words  the  speaker 
was  likely  to  say,  using  syntactic  and  semantic  constraints  (Barr  and  Feigenbaum,  1982, 
pp.  326-327). 


2.  HEARSAY II 

HEARSAY  II  is  characterized  by  its  cooperative  problem-solving  system  architecture  (see 
Figure  IV-3)  which  employs  a  set  of  programmed  “specialists”  (Knowledge  Sources:  KS’s)  inter¬ 
acting  via  a  shared  common  blackboard  on  which  their  decisions  were  recorded.  The  blackboard 
can  be  visualized  as  a  global  data  structure  representing  a  multi-level  network  of  alternative 
hypotheses. 

HEARSAY  has  a  total  of  12  KS’s,  which  at  the  lower  levels  created  syllable  class  hypotheses 
from  segments,  word  hypotheses  from  syllables,  etc.  At  the  higher  levels,  KS’s  acted  to:  predict 
all  possible  words  that  might  syntactically  precede  or  follow  a  phrase,  create  phrase  hypotheses 
from  verified  contiguous  word-phrase  pairs,  etc. 

The  majority  of  the  hypotheses  contributed  by  the  KS’s  at  any  level  did  not  end  up  in  the  final 
interpretation  of  the  sentence.  Instead,  only  the  most  likely  hypotheses  were  chosen  for  expan¬ 
sion.  The  individual  KS’s  operated  somewhat  independently  and  asynchronously  through 
pattern-invoked  programs  when  matching  patterns  appeared  on  the  blackboard.  To  economize 
on  computing  resources,  each  hypothesis  was  rated  and  (using  an  appropriate  scheduling  routine) 
the  most  likely  patterns  were  expanded  first. 
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Figure  IV-3.  A  Block  Diagram  of  the  CMU  Hearsay-II  System  Organization 
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3.  HARPY 

A  crude  way  of  thinking  of  HARPY  is  as  a  compiled  version  of  HEARSAY  II.  HARPY  uses  a 
single  precompiled  network  knowledge  structure.  Barr  and  Feigenbaum  (1981,  p.  349)  report: 

The  network  contains  knowledge  at  all  levels:  acoustic,  phonemic,  lexical,  syntactic,  and  semantic.  It  stores 
acoustic  representations  of  every  possible  pronunciation  of  the  words  in  all  of  the  sentences  that  HARPY 
recognizes.  The  alternative  sentences  are  represented  as  paths  through  the  network,  and  each  node  in  the  network 
is  a  template  of  allophones  (distinctive  variations  of  phonemes,  dependent  on  adjacent  phonemes). 

The  paths  through  the  network  can  be  thought  of  as  “sentence  templates,”  much  like  the  word  templates  used  in 
isolated-word  recognition. 

HARPY  uses  a  heuristic  method  called  “beam  search”  for  searching  for  the  sentence  in  the 
network  that  most  closely  matches  the  input  signal.  HARPY  proceeds  from  left  to  right  through 
the  network,  matching  spoken  sounds  to  allophonic  states;  and  assigning  scores  based  on  the 
goodness  of  the  match.  HARPY  keeps  the  paths  with  the  best  cumulative  scores,  pruning  away 
others  which  fall  some  threshold  amount  below  the  best  scoring  path  (Erman  et  al,  1980). 

4.  HWIM 

The  HWIM  (Hear  What  I  Mean)  speech  understanding  system  was  developed  at  BBN. 
HWIM’s  domain  was  that  of  travel  budget  management.  HWIM’s  organization  is  shown  in 
Figure  IV-4.  The  lower  components  digitize  the  speech  signal  and  generate  a  parametric  represen¬ 
tation  of  it,  which  is  then  segmented  and  labeled  into  phonemes  which  are  ranked  as  to  the  quality 
of  their  match.  These  ranked  phonemes  are  pictured  as  a  segmented  lattice,  which  is  a  graph  that 
is  divided  into  time  segments  and  read  from  left  to  right.  This  graph  is  matched  against  a  dic¬ 
tionary  of  work  pronunciations  (stored  as  a  network  with  phonemes  for  nodes)  by  lexical  retrieval 
components  which  generate  word  hypotheses. 

HWIM’s  higher  levels  include  information  about  trips  (semantics),  syntax  and  word  verifica¬ 
tion.  The  verification  component  takes  the  pronunciation  of  hypothesized  words  and  generates  a 
synthesized  parameter  representation  that  is  compared  to  the  parameters  generated  from  the  ac¬ 
tual  signal. 

HWIM  has  a  central  control  which  uses  the  system’s  knowledge  sources  as  subroutines.  The 
system  extends  bottom-up  theories  using  the  top-down  syntactic  and  semantic  components.  The 
system  expands  its  hypotheses  about  the  first  recognized  word  in  the  sentence. 

5.  Summary  of  the  ARPA  SUR  Program 

ARPA’s  program  did  not  result  in  a  useable  speech  understanding  system.  The  resulting 
systems  were  too  slow,  too  restricted  and  required  large  computational  resources.  However,  it  did 
discover  and  elucidate  much  new  information  about  speech,  and  developed  new  architectural  in¬ 
sights,  particularly  the  blackboard  architecture  that  has  since  been  used  in  other  AI  systems.  Per¬ 
formances  of  the  different  systems  were  difficult  to  compare  because  of  the  different  vocabularies 
and  domains  employed.  One  critical  factor  in  comparison  is  the  average  branching  factor  (ABF). 
This  refers  to  the  average  number  of  words  that  might  come  next  after  each  work  in  a  legal 
sentence.  Table  IV-3  summarizes  the  three  major  ARPA  SUR  projects.  Note  that  the  ABF  is  196 
for  HWIM’s  database  retrieval  task,  versus  33  for  HEARSAY’S  and  HARPY’s  document 
retrieval  task. 


65 
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WAVEFORM 


After  Klatt  (1977) 

Figure  IV-4.  Block  Diagram  of  the  BBN  HW1M  System  Organization 
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TABLE  IV-3.  Summary  of  ARPA ’s  Speech  Understanding  Systems 
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TABLE  IV-3.  Summary  of  ARPA ’s  Speech  Understanding  Systems  ( cont.J 


H.  State  of  the  Art 

I.  Speech  Recognition 

Table  IV-4  is  a  summary  of  a  recent  Texas  Instruments’  study  of  commercial  speech 
recognizers  tested  on  a  20  word  vocabulary  consisting  of  the  10  spoken  digits  “zero”  thru  “nine” 
and  ten  command  words:  start,  stop,  yes,  no,  go,  help,  erase,  rubout,  repeat  and  enter. 

In  1982,  speaker-dependent  connected-word  short-string,  small  vocabulary  (approx.  50  words) 
recognizers  were  commercially  available.  These  could  recognize  up  to  90  wpm  of  connected 
speech  compared  to  a  typical  person’s  speaking  rate  of  150  wpm.  The  vocabulary  size  is  usually 
less  than  150  words,  but  is  application  dependent.  Recognition  accuracies  of  98%  or  greater  are 
being  achieved  in  factory  environments.  Current  turnkey  systems  are  in  the  $5K  to  $75K  range. 
Consumer  product  speech-recognizer  subsystems  for  toys,  personal  computers,  voice-controlled 
appliances,  etc.,  cost  from  $6  to  $100. 

Voice  recognition  systems  are  here,  viable,  proven,  but  still  somewhat  costly.  In  industry  ap¬ 
plications,  they  have  demonstrated  large  increases  in  productivity.  Hundreds  of  successful  in¬ 
stallations  exist  today.  Plohar  (1983)  discusses  the  human  factor  considerations  associated  with 
successful  applications. 

2.  Speech  Understanding 

There  are  no  commercial  true  speech-understanding  systems  today.  However,  there  are  a 
number  of  U.S.  companies  working  on  future  commercial  systems. 

a.  Bell  Labs 

Has  been  working  on  a  semantic  sentence  recognizer  and  interpreter  utilizing  a  finite  state 
grammar  and  a  small  vocabulary.  The  intent  is  to  produce  an  interactive  speech  understanding 
system  for  use  over  the  telephone  (Levinson  and  Liberman,  1981). 

b.  IBM  —  T.J.  Watson  Res.  Center 

IBM  has  had  the  largest  effort  in  continuous-speech  recognition  and  understanding,  capital¬ 
izing  on  the  HARPY  “Beam  Search”  approach. 

c.  Other  organizations  involved  in  developing  speech  understanding  systems  include  BBN. 

I.  Who  Is  Doing  Speech  Recognition  Related  Work 

1.  Commercial  Organizations 
IBM 
TI 

Bell  Labs 
Verbex 

Nippon  Electric 

Threshold  Technology 

Interstate  Electronics 

Matsushita 

Scott  Instruments 

Sanyo 

INTEL 

ITT  (San  Diego) 

Fairchild 
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TABLE  IV-4.  T.I.  ’s  Test  of  Speech  Recognizers  on  Individual  Words 
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Hewlett  Packard 
Haskins  Lab 
Lincoln  Labs 

Speech  Communications  Research  Lab 

Sperry  Univac 

Votan 

Voice  Machine  Communications 
Voice  Processing  Corp. 

General  Instrument  —  Milton  Bradley 
Voice  Control  Systems 

2.  Universities 
M.I.T. 

C.M.U. 

V.P.I. 

U  of  CA  at  Berkeley 


J.  Problems  and  Issues* 

•  Speech  perception  at  the  acoustic  level  is  a  critical  factor  in  achieving  advanced  recognition 
capability.  Current  commercial  word  recognizers  have  not  yet  made  full  use  of  available 
knowledge. 

•  Widespread  use  of  speech  recognizers  await  the  availability  of  low  cost  connected-speech 
systems  achieving  better  than  a  99%  accuracy  with  limited  vocabularies  —  100  words. 

•  Capabilities  of  a  word  recognizer  depend  on: 

(1)  Can  it  recognize  connected  speech? 

(2)  Is  it  speaker  independent? 

(3)  How  big  a  vocabulary  can  it  recognize? 

•  The  greatest  difficulty  that  speech  recognizers  have  is  determining  word  end-points  —  the 
source  of  many  word-recognition  errors  for  isolated  word  recognizers. 

•  A  major  problem  is  separating  linguistically  significant  variations  in  the  speech  signal  from 
insignificant  variations  (such  as  variations  in  word  pronunciations). 

•  Noise  is  also  a  major  problem  in  speech  recognition,  often  resulting  from  actions  of  the 
speaker  himself. 

•  Large  vocabulary  size  is  a  problem  to  users,  who  need  to  remember  what  the  machine  can 
recognize. 

•  The  two  main  errors  made  by  speech  recognizers  are: 

(1)  Substitution,  and 

(2)  Rejection 

•  Other  less  common  errors  are  insertion  and  deletion. 

•  There  are  as  yet  no  standards  for  test  or  evaluation  of  systems  —  a  major  problem. 

These  have  been  gleaned  primarily  from  Doddington  and  Schalk  (1981). 
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•  It  is  not  the  number  of  words  that  are  the  major  difficulty,  it  is  how  close  their  sound  is  to 
each  other.  The  natural  English  alphabet  is  a  particularly  difficult  set  of  sounds  to 
distinguish. 

•  Software  or  hardware  may  also  have  idiosyncracies  that  adversely  affect  recognition  per¬ 
formance. 

•  As  recognizer  performance  improves,  evaluation  becomes  more  difficult,  because  more 
testing  is  required  to  achieve  statistical  significance. 

•  The  pronunciation  of  individual  words  change  depending  on  the  adjacent  words  in  the 
sentence. 

•  The  hypothesize  and  test  approach  needs  abundant  computer  power  —  a  major  factor 
limiting  its  commercial  use. 

•  Integrating  recognizers  into  an  application  requires  substantial  software  and  human  factors 
considerations.  This  has  limited  real-world  adoption. 

K.  Future  Trends 

It  is  anticipated  that  speaker-independent,  continuous-speech  recognition  systems  with  limited 
vocabularies  (10-20  words),  having  an  accuracy  of  98%  or  better,  will  be  available  by  the 
mid-1980’s.  Automatic  dictation  will  probably  not  appear  before  the  1990’s,  with  Japanese 
language  systems  being  the  first  to  appear.  (Japanese  language  has  only  on  the  order  of  500 
syllables,  compared  to  10K  for  English.)  Speech  understanding  is  a  major  part  of  the  Japanese 
5th  Generation  Computer  Project  (Feigenbaum  and  McCorduck,  1983). 

Due  to  the  advancement  in  VLSI,  it  is  expected  that  voice  recognition  chips  for  toys  will  soon 
be  in  the  $6  range  —  $50  for  a  complete  system. 

A  strong  expectation  is  that  a  speech  understanding  system  using  a  natural  language  parser  will 
be  introduced  by  IBM  in  the  mid-80’s. 

Around  1990,  true  commercial  speech  understanding  systems,  having  the  capabilities  of  the 
ARPA  SUR  systems  but  operating  in  near  real-time,  are  expected  to  appear. 

By  1990,  speech  recognition  and  understanding  is  expected  to  be  a  billion  dollar  a  year  industry 
(Elphick,  1982). 
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V.  SPEECH  SYNTHESIS 


A.  Introduction 

Speech  synthesis  —  speech  output  from  a  computer  —  is  an  emerging  technology  whose  pro¬ 
ducts  are  already  becoming  commonplace.  Though  the  present  market  for  these  devices  is  still 
small,  the  future  looks  very  bright. 

Speech  synthesis  is  not  normally  considered  an  AI  topic,  though  it  is  sure  to  play  an  important 
part  in  many  future  AI  systems,  particularly  when  coupled  with  speech  understanding.  One  may 
very  well  consider  these  synthesis  systems,  which  employ  rules  (often  heuristic)  for  deriving 
speech  from  stored  speech  elements,  as  an  example  of  an  “expert  system  on  a  chip.” 

B.  Why  Synthesis 

One  approach  to  making  available  speech  when  needed  is  to  record  the  speech  and  play  it  back 
as  required.  The  disadvantage  is  that  mechanical  devices  are  often  unreliable  and  the  ability  to 
generate  new  sentences  from  stored  words  is  quite  limited  because  of  access  time,  and  therefore 
unsuitable  for  most  computer-based  applications. 

A  more  reliable  approach  is  to  use  digital  sound  recording  techniques,  enabling  speech  to  be 
stored  in  solid-state  memories  having  no  moving  parts  to  break  down.  The  disadvantage  is  that  an 
enormous  amount  of  storage  is  required  —  in  the  order  of  50,000  bits  per  second  of  digital  speech 
(at  the  typical  speaking  rate  of  150  words  per  minute).  However,  if  words  are  represented  by  the 
digital  code  for  their  letters,  the  same  information  requires  only  about  100  bits  per  second  of 
speech.  This  two  to  three  orders  of  magnitude  difference  highlights  the  importance  of  speech 
compression  for  any  digital  representation  of  speech,  not  only  to  save  storage  requirements,  but 
also  to  vastly  reduce  the  bandwidth  required  for  electronic  speech  transmission.  All  speech  syn¬ 
thesis  methods  use  some  form  of  speech  compression. 

Speech  synthesis  serves  three  basic  purposes: 

1)  Recreating  speech  from  a  compressed  speech  representation 

2)  Generating  speech  from  stored  speech  elements  such  as  by  concatenating  representations  for 
words,  and 

3)  Generating  speech  from  text. 

The  first  purpose  is  associated  with  minimizing  storage  or  transmission  bandwidth  re¬ 
quirements.  The  second  with  creating  speech  from  stored  components  under  microprocessor  or 
computer  control.  The  third  with  reading  machines  and  computer-human  interaction. 

An  indication  of  applications  of  speech  synthesis  is  given  in  Table  V-l. 


C.  Human  Speech 

As  many  speech  synthesizers  actually  employ  an  approximate  simulation  of  the  human  speech 
production  mechanism,  it  is  helpful  to  briefly  review  human  speech  and  its  generation.  Human 
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TABLE  V-l.  Applications  of  Speech  Synthesis. 


Military 

•  Operation  of  military  equipment 

•  Warnings 

•  Reminders 

•  Service  and  operation  aids 

•  Trainers  and  simulators 

•  Secure  communications 

Computer 

•  Communication  by  computers  to  users. 

Consumer 

•  Talking  appliances 

•  Teaching  devices 

•  Toys 

•  Talking  typewriters  and  calculators 

•  Talking  watches 

•  Automobile  warning  devices,  reminders,  and  annunciators  for  instruments 

•  Devices  for  the  blind 

•  Communication  for  the  speech  handicapped 

T  elecommunications 

•  Synthesized  telephone  messages 

•  Speech  compression  for  “store  and  forward,”  to  reduce  communication  costs 

•  Vocal  delivery  of  electronic  mail 

Industrial 

•  Speaking  instruments 

•  Speaking  cash  registers 

•  Alarm  systems 

•  Automated  office  equipment 

•  Industrial  process  control 

•  Station  and  floor  announcers  for  trains,  buses,  elevators,  etc. 

•  Systems  operations  where  the  operators  have  their  visual  attention  elsewhere 

•  Emergency  warning  devices  for  airplanes,  machines,  etc. 

•  Control  room  annunciators  for  sensors 

•  Text  readers 

•  Data  entry  (with  vocal  verification) 
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speech  consists  basically  of  a  combination  of  vocal  sounds  such  as  vowels,  fricative  sounds — such 
as  f,  th  or  sh,  and  plosive  or  stop  consonant  sounds  such  as  b  and  d. 

The  human  vocal  tract  can  be  considered  as  an  acoustic  tube  terminated  at  one  end  by  the  vocal 
cords  and  at  the  other  end  by  the  lips.  This  resonant  tube  has  a  side  branch  —  the  nasal 
resonator  —  separated  by  a  flap  called  the  velum. 

Voiced  sounds  are  produced  by  forcing  air  from  the  lungs  past  the  tensed  vocal  cords  which 
are  thus  forced  to  vibrate,  emitting  puffs  of  air  into  the  vocal  tract.  (The  puff  frequency  —  about 
100  hertz  in  males,  200  hertz  in  females  —  is  a  function  of  the  vocal  cord  size  and  tenseness.) 
These  puffs  of  air  excite  the  vocal  tract,  stimulating  their  resonant  (formant)  frequencies.  Most  of 
the  resulting  sound  energy  is  contained  in  these  resonant  responses,  the  frequency  of  which  can  be 
varied  by  changing  the  shape  of  the  vocal  tract  by  moving  the  lips,  jaw  or  tongue. 

Fricative  sounds  occur  when  a  constriction  in  the  vocal  tract  leads  to  turbulent  air  flows  after 
the  constriction. 

Plosives  are  generated  by  briefly  closing  the  vocal  tract  until  pressure  builds  up  and  then  releas¬ 
ing  the  pressure. 

D.  Electronic  Simulation  of  the  Speech  Mechanism 

The  three  basic  human  speech  sounds  can  be  electronically  simulated  as  follows  —  as  illustrated 
by  the  Computalker  Consultants  Model  CT-1*  synthesizer  shown  schematically  in  Figure  V-l. 

Voiced  sounds  can  be  simulated  by  passing  energy  from  a  variable  periodic  source  —  cor¬ 
responding  to  the  vocal  cord  puffs  —  through  a  series  of  variable  filters  (fj,  f2,  f3)  corresponding 
to  the  vocal  tract  resonances  (formants).  Plosive  sounds  are  produced  the  same  way,  but  require 
rapid  changes  in  the  amplitude  parameters  A0  and  An.  Fricative  sounds  are  produced  by  passing 
white  noise  through  a  variable  filter  (ff).  Some  sounds,  such  as  v  and  z,  are  produced  using  both 
the  periodic  and  noise  mechanisms. 

Using  this  approach,  human  speech  can  be  simulated  by  controlling  the  frequency  parameters 
(fj)  and  the  amplitude  parameters  (Aj)  over  time.  Some  variant  of  this  basic  method  —  referred  to 
as  parametric  coding — is  used  in  all  speech  synthesizers  that  simulate  human  speech  production. 

E.  Synthesis  in  Speech  Compression  and  Regeneration 

Synthesis  has  the  role  of  regeneration  in  speech  compression  schemes  (associated  with  speech 
storage  or  minimal  bandwidth  speech  transmission). 

There  are  two  basic  speech  compression  techniques  —  frequency  domain  analysis  (parametric 
coding  as  discussed  in  the  previous  section  on  electronic  simulation),  and  time  domain  analysis. 
Frequency  domain  methods  tend  to  dominate  commercial  speech  synthesis,  but  time  domain 
analysis  has  become  important  for  limited-vocabulary  word  synthesis. 

The  frequency  domain  approach  analyzes  the  incoming  speech  to  be  compressed  and  generates 
the  parameters  needed  for  regenerating  the  signal  using  an  electronic  simulation  of  the  vocal 
tract.  In  some  cases,  these  parameters  may  be  further  compressed  for  reduced  storage.  Speech  is 
generated  by  inverting  the  process  as  indicated  in  Figure  V-2. 

Time  domain  analysis  is  characterized  by  waveform  compression  techniques.  Waveform 
digitization  coding,  researched  extensively  by  Bell  Labs,  takes  the  original  waveform  of  spoken 

*No  longer  in  production,  but  the  Phillips  speech  chip  essentially  does  the  same  thing. 
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After  Sherwood  (1 979) 


Figure  V-l.  A  Simplified  Diagram  of  the  Computalker  CT-1  Parametric  Synthesizer. 


A.  RECORDING 


B.  PLAYBACK 


After  Sherwood  (1979) 

Figure  V-2.  Recording  and  Reproduction  of  Speech  Using  a  Compressed-Speech  System. 
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words  and  compresses  them  using  a  complicated  algorithm.  The  final  compressed  waveform  is 
stored  as  bits  in  memory  for  later  reconstruction  of  the  original  waveform.  Though  generally  pro¬ 
ducing  better  sounding  speech  than  parametric  coding,  waveform  digitization  coding  requires  two 
to  four  times  as  much  storage  as  that  needed  for  parametric  coding. 

F.  Parametric  Coding  Schemes 

1.  Introduction: 

All  frequency  domain  compression  techniques  employ  some  sort  of  electronic  model  of  the 
human  vocal  tract.  Thus,  all  have  one  or  more  filters  to  simulate  vocal  tract  resonances,  and 
periodic  and  noise  energy  sources,  and  are  controlled  by  varying  the  parameters  associated  with 
pitch,  loudness,  and  filter  frequencies. 

2.  Formant  Coding 

This  is  a  straightforward  approach  to  controlling  an  electronic  model  of  the  vocal  tract  by  con¬ 
trolling  the  tunable  filters  using  parametric  signals  that  represent  the  formant  (vocal  tube  reso¬ 
nant)  frequencies  such  as  those  shown  in  Figure  V-l.  As  the  formant  frequencies  change  relatively 
slowly,  the  parameters  need  to  be  updated  relatively  infrequently,  thus  allowing  data  compres¬ 
sion. 

3.  Linear  Predictive  Coding  (LPC) 

LPC,  pioneered  by  TI  for  “Speak  and  Spell,”  is  a  form  of  formant  coding  which  allows  fur¬ 
ther  compression  of  the  parameters.  As  the  formant  frequencies  tend  to  change  slowly,  current 
samples  are  predicted  from  weighted  linear  combinations  of  previous  samples.  TI’s  LPC’s  clever 
prediction  approach,  and  the  use  of  an  ingenious  lattice  filter,  greatly  simplifies  the  synthesis  cir¬ 
cuitry.  The  resulting  system  can  be  stored  on  a  single  chip  and  produces  high  quality  natural 
sounding  speech. 

4.  PARCOR 

PARCOR  (partial  correlation),  utilized  by  Japanese  manufacturers,  is  a  variant  of  LPC.  LPC 
extrapolates  from  a  series  of  formant  samples  to  predict  following  formant  frequencies.  Though 
most  speech  patterns  change  slowly,  plosive  and  fricative  sounds  involve  rapid  changes.  PAR¬ 
COR  makes  LPC  more  sensitive  to  sudden  changes  by  giving  greater  emphasis  to  the  correlation 
between  adjacent  parametric  samples  and  less  to  the  longer  term  patterns.  However,  there  ap¬ 
pears  to  be  little  resultant  subjective  differences  in  observed  speech  quality  between  the  two  ap¬ 
proaches. 

5.  Line  Spectrum  Pair  (LSP) 

NTT  (Nippon  Telephone  and  Telephone  Public  Corp.)  which  developed  PARCOR,  has  come 
up  with  LSP,  an  approach  allowing  still  further  compression.  LSP  defines  the  boundary  condi¬ 
tions  for  the  individual  formant  frequencies  as  those  corresponding  to  the  open  and  closed  vocal 
tract.  NTT  claims  that  for  a  complete  system,  some  40%  more  compression  can  be  achieved  with 
LSP  than  with  PARCOR,  while  maintaining  nearly  the  same  speech  quality. 
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6.  Parametric  Waveform  Coding  (PWC) 

PWC  is  another  variant  of  LPC,  as  used  by  Centigram’s  Voice  Ware  system  to  produce 
vocabularies  for  the  Lisa  Speech  Board.*  PWC  uses  a  variable-length  slice  of  waveform  to  pro¬ 
duce  the  linear  prediction  coefficients.  Each  slice  (about  20  milliseconds  in  length)  corresponds  to 
a  “glottal  event”  —  the  event  associated  with  each  puff  of  air  passing  through  the  vocal  tract. 
Voice  Ware  uses  an  array  processor  to  determine  13  linear  prediction  coefficients  for  each  glottal 
event.  To  synthesize  speech,  the  Lisa  Speech  Board  uses  these  coefficients  and  the  lengths  of  the 
events  to  recreate  speech  waveforms  as  in  other  LPC  synthesizers.  The  PWC  approach  tends  to 
yield  more  natural  speech  than  the  simpler  LPC  systems,  but  requires  a  higher  data  rate. 

G.  Waveform  Coding  Schemes 
1.  ADPCM 

Digitized  speech  at  a  8  kHz  sampling  rate  results  in  32,000  bits  per  second  (bps)  for  a  4  bit 
sampling  size  using  the  adaptive  differential  pulse  code  modulation  (ADPCM)  proposed  as  the 
worldwide  preferred  method  of  digitized  voice  telephone  signals  for  long  distance  transmission. 
In  ADPCM,  the  digitized  speech  is  encoded  in  terms  of  the  amplitude  differences  between  adja¬ 
cent  samples.  These  differences  are  adaptively  encoded  in  terms  of  quantization  level  (a  function 
of  the  previous  quantization  level  and  the  previous  PCM  value).  A  close  relative  of  ADPCM  is 
CVSD  (continuous  variable  slope  delta  modulation). 

2.  Mozer’s  Waveform  Coding 

Though  ADPCM  is  suitable  for  telephone  transmission,  its  high  bit  rate  is  unsuitable  for  stored 
speech  synthesis.  A  scheme  by  Dr.  Forrest  Mozer  of  the  University  of  California  is  a  variation  of 
ADPCM  which  provides  substantial  further  compression.  This  technique  has  been  incorporated 
into  the  National  Semiconductor’s  Corporation’s  Digitalker.  Dr.  Mozer’s  approach  is  to: 

1)  Analyze  the  waveform  to  detect  short  periods  with  little  change.  The  waveform  for  these 
periods  are  then  replaced  with  identical  waveforms. 

2)  Fourier  analyze  the  signal  and  adjust  the  phase  angle  of  each  Fourier  component  to  produce 
a  symmetrical  waveform  and  then  discard  half. 

3)  Discard  low  amplitude  portions  of  the  waveform  which  are  not  heard  by  the  ear. 

4)  Employ  ADPCM  to  further  reduce  data. 

The  net  result  of  these  actions  is  more  than  a  40  to  one  reduction  in  the  data  that  needs  to  be 
stared,  as  compared  with  the  data  in  direct  digitization.  To  produce  speech  the  process  is  inverted. 
Though  these  resultant  signals  look  little  like  the  original,  the  result  is  very  good  speech  reproduc¬ 
tion. 

H.  Coding  the  Words  To  Be  Stored 

Though  the  schemes  discussed  thus  far  provide  a  huge  amount  of  reduction  in  the  storage  re¬ 
quired,  generating  the  required  custom  vocabulary  in  terms  of  the  stored  parameters  requires 
hand  tailoring  by  an  expert.  As  yet,  there  is  no  acceptable  automatic  mechanism  for  directly  con¬ 
verting  speech  into  satisfactory  storage  elements  for  encoding  schemes  that  provide  high  data 


*No  longer  in  production. 
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compression.  (ADPCM  is  automatic.  Parametric  schemes  can  be  automated  with  small  residual 
errors.) 

Developing  the  vocabulary  for  the  Mozer  Waveform  Coding,  used  in  National 
Semiconductor’s  Digitalker,  takes  about  one  hour  of  processing  per  word.  It  involves  working 
with  the  data  compression  and  zero  phase-encoding  algorithms,  that  produce  the  stored  bit  pat¬ 
terns,  making  it  very  difficult  for  users  to  program  their  own  custom  vocabularies  (Ciarcia,  1983). 

To  enable  users  to  develop  their  own  custom  vocabularies  for  their  products,  when  large 
vocabularies  are  required,  Centigram  Corp.  has  offered  as  a  product  their  Voice  Ware  develop¬ 
ment  system.  With  it,  users  can  input  tape  recorded  voice  to  a  digitizer  that  supplies  a  4800  bps 
data  stream  to  a  microprocessor-based  CRT-terminal  work  station.  The  station  converts  the 
signal  into  parametric  waveform  coding  (PWC).  The  user  can  then  edit  the  messages,  combine 
them  into  files,  and  feed  them  back  through  the  Lisa  synthesizer  to  hear  how  they  sound.  If  the 
sound  is  unsatisfactory,  particularly  for  concatenated  phrases,  the  phrases  can  be  rerecorded  to 
achieve  the  desired  continuity  and  balance. 

In  general,  for  synthesizer  users  requiring  a  small  custom  vocabulary,  it  is  customary  for  them 
to  contract  with  the  synthesizer  manufacturer  or  other  development  source  for  the  words  re¬ 
quired.  This  cost  is  in  the  order  of  $100  per  word  for  LPC  chips. 

I.  Generating  Speech  from  Text 

English  has  some  40  basic  speech  sounds  called  phonemes,  corresponding  to  16  vowel  sounds,  6 
stops,  8  fricatives,  3  nasals  (such  as  ng),  4  liquids/glides  (such  as  1  in  lice)  and  3  others  (such  as  ch 
in  church).  These  sounds  vary  somewhat  depending  upon  how  they  are  combined  into  words  or 
used  in  speech.  These  phoneme  variations  are  called  allophones.  (Texas  Instruments  developed  a 
set  of  128  allophones  to  characterize  English  speech.)  Allophones  and  the  rules  to  string  them 
together  can  be  stored  in  computer  memory  chips.  The  first  text-to-speech  system  used  a 
phonemic  synthesizer  (Votrax).  Votrax  utilized  a  hard-wired  phonemic  to  parameter  converter 
which  then  fed  a  formant  synthesizer  to  create  speech.  A  simplified  text-to-speech  system 
schematic  is  given  in  Figure  V-3. 

The  highly-intelligible  state-of-the-art  speech  synthesizer,  the  Speech  Plus  “Prose  2000,” 
utilizes  a  generation  approach  consisting  of  five  serial  processes :  1)  Text  normalization,  2) 

CAN  BE  SOFTWARE/FIRMWARE 


(ENGLISH, 

SPANISH,  ETC.)  ^ - 

PHONEMIC  SYNTHESIZER 


After  Sherwood  (1979) 


Figure  V-3.  Text  to  Speech  Synthesis. 
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Phonemics,  3)  Allophonics,  4)  Prosodies,  5)  Parameter  generation.  For  words  not  in  the  excep¬ 
tions  lexicon,  the  phonemics  process  is  implemented  as  a  real-time  expert  system  consisting  of  a 
small  rule  interpreter  and  an  ordered  set  of  about  400  context-sensitive  rules. 

J.  State  of  the  Art 

Elphick  (1981,  p.  42)  notes  that: 

Most  commercial  synthesizers,  especially  low-cost  ones  used  for  consumer  products,  derive  their  speech  elements 

from  recordings  of  actual  human  speech.  The  recorded  speech  patterns  are  compressed,  and  the  speech  is 

disassembled  into  a  vocabulary  of  small  elements  for  later  reassembly  into  messages. 

High  quality  speech  by  phoneme  synthesizers  has  been  achieved  in  research  systems,  but  not  in 
commercial  systems.  The  most  natural  commercial  speech  synthesizers  use  the  waveform  ap¬ 
proach. 

Figure  V-4  is  an  indication  of  speech  quality  versus  bit  storage  requirements  for  the  various  syn¬ 
thesis  techniques. 

Thus  far,  in  industrial  applications,  only  short  messages  are  practical,  as  prolonged  listening  to 
synthetic  speech  tends  to  fatigue  the  operators  (Andreiev,  1981). 

Speech  chips  with  limited  vocabularies  are  available  in  the  range  of  $10  and  up.  To  construct 
the  initial  representations  for  new  words  (to  be  stored  in  ROMs)  runs  upward  of  tens  of  dollars 
per  word. 


Figure  V-4.  Speech  Quality  Versus  Bit  Rate  for  Various  Coding  Schemes. 
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ENGLISH  TEXT 
(IN  ASCII  CODE) 


SYNTHETIC  SPEECH 
WAVEFORM 

I 


D/A  CONVERTER, 
AMPLIFIER  AND 
SPEAKER 

l - 

SPEECH  SIGNAL 


After  Zue  (1982) 


Figure  V-5.  Text  to  Speech  Conversion. 
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Programming  advanced  speech  synthesizers,  to  be  used  with  speech  generation  from  text,  is  an 
enormous  task.  The  flow  diagram  for  such  a  state  of  the  art  system  is  given  in  Figure  V-5.  First, 
the  printed  text  must  be  converted  into  phonemes  by  using  a  combination  of  rules  and  a  stored 
pronouncing  dictionary,  taking  into  account  pitch,  intensity,  and  duration  associated  with  em¬ 
phasis,  as  influenced  by  word  use  determined  by  the  syntax  of  the  sentence.  The  resultant 
allophones  (phonemic  variations)  are  then  fed  to  a  phonemic  voice  synthesizer. 

The  major  commercial  application  thus  far  for  speech  generation  from  text  is  reading  systems 
for  the  blind.  These  products  input  text  using  optical  character  recognition,  and  output  speech  us¬ 
ing  a  text-to-speech  synthesizer.  Other  applications  include  electronic  mail-to-voice,  and  proof¬ 
reading. 

K.  Some  Available  Commercial  Systems 

An  indication  of  manufacturers  and  currently  available  commercial  systems  is  given  by  Table 
V-2. 


L.  Problems  and  Issues 

•  There  is  a  tradeoff  in  system  design  between  speech  quality,  vocabulary  size,  and  cost. 

•  Problem  of  how  to  best  divide  the  fundamental  units  to  be  used  —  allophones,  syllables, 
words.  The  smaller  units  permit  very  large  vocabularies  without  excessive  storage  re¬ 
quirements,  while  the  larger  units  (such  as  phrases)  provide  superior  speech  quality. 

•  Memory  cost  considerations  tend  to  restrict  the  use  of  the  word  synthesis  approach. 

•  As  the  synthesizer  techniques  improve,  it  may  be  that  errors  due  to  low  sampling  rates,  and 
inadequate  consideration  of  coarticulation  and  prosodic  (speech  stress)  effects  may  be  the 
limiting  factors. 

•  Speech  compression  techniques  are  crucial  to  minimize  memory  requirements  in  the  syn¬ 
thesizer. 

•  The  high  cost  of  generating  words  for  synthesizer  vocabularies  needs  to  be  reduced. 

•  Similarly,  the  high  cost  of  storing  words  in  ROM  needs  to  be  addressed. 

•  Updating  stored  vocabularies  is  problematical  due  to  the  need  to  keep  the  same  speaker 
available. 

M.  Forecast 

Though  the  market  for  voice  synthesizers  is  still  relatively  small,  it  is  estimated  that  it  will  be 
close  to  one-half  billion  dollars  by  1985  and  will  reach  several  billion  dollars  by  1990.  Talking 
devices  will  have  a  big  impact  on  industrial  operations,  a  major  effect  on  learning  devices,  and 
will  probably  be  ubiquitous  throughout  home  and  consumer  products.  These  devices  will  be  a 
boon  to  the  handicapped,  in  everything  from  talking  typewriters  and  appliances,  and  reading 
machines  for  the  blind,  to  speech  prosthetics.  It  is  also  anticipated  that  these  devices  will  be  found 
virtually  everywhere  in  vehicles  and  transportation  systems. 

Because  of  their  integration  into  single  chips,  the  cost  of  stored  vocabulary  devices  will  con¬ 
tinue  to  drop  so  that  basic  hardware  costs  of  less  than  $10,  for  units  having  vocabularies  of 
several  hundred  words,  are  foreseen  by  the  end  of  this  decade. 
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TABLE  V-2.  Some  Available  Commercial  Synthesizer  Systems  (cont.) 
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"■Chip  prices  range  from  $3  to  $15  depending  on  model  and  quantity.  Speech  Plus  provides  custom  vocabulary  generation  services  for  speech 
synthesizer  chips  at  $100/word. 


REFERENCES 


•  Andreiev,  N.,  “Speech  Synthesis:  High  Technology’s  Dark  Horse  in  Search  of  New 
Pastures,”  Control  Engineering,  September  1981,  pp.  95-98. 

•  Berney,  C.  L.  and  Harshman,  C.,  “Voice  Ware  Does  It  Differently,  Mini-Micro  Systems, 
March  1982,  pp.  183-193. 

•  Brightman,  T.  and  Crook,  S.,  “Exploring  Practical  Speech  I/O,”  Mini-Micro  Systems,  May 
1982,  pp.  291-306. 

•  Ciarca,  S.,  “Use  ADPCM  for  Highly  Intelligible  Speech  Synthesis,”  Byte,  June  1983,  pp. 
35-49. 

•  Damper,  R.  I.,  “Speech  Technology  —  Implications  for  Biomedical  Engineering,”  J.  of 
Medical  Engineering  and  Technology,  Vol  6  No.  4  (July/ Aug  1982),  pp.  135-149. 

•  Elphick,  M.  “Talking  Machines  Aim  for  Versatility,”  High  Technology,  Sept/Oct  1981,  pp. 
41-48. 

•  Flanagan,  J.  L.,  “Talking  with  Computers:  Synthesis  and  Recognition  of  Speech  by 
Machines,”  IEEE  Transactions  on  Biomedical  Engineering,  Vol  BME-29,  No.  4,  April  1982,  pp. 
223-232. 

•  Gilblom,  D.  L.,  “A  High-Quality  Real-Time  Text-to-Speech  Converter,”  Electro-82  Profes¬ 
sional  Program  Record,  Session  11,  Boston,  Mass,  May  1982,  Paper  No.  11/2. 

•  Lerner,  E.  J.,  “Products  That  Talk,”  Spectrum,  July  1982,  pp.  32-37. 

•  Sherwood,  B.  A.,  “The  Computer  Speaks,”  Spectrum,  August  1979,  pp.  18-25. 

•  Zue,  V.  W.,  Tutorial  on  Natural  Language  Interfaces:  Part  2-Speech,  Menlo  Park,  CA: 
AAAI,  Aug  17,  1982. 


86 


VI.  PROBLEM  SOLVING  AND  PLANNING 


A.  Introduction 

Nilsson  at  SRI  originally  specified  problem  solving  and  planning  as  being  one  of  the  four 
fundamental  application  areas  of  AI.  However,  the  “weak  methods,”  employing  little  domain 
knowledge,  originally  used  in  AI  for  problem  solving  and  planning,  proved  inadequate  for  com¬ 
plex  real-world  problems.  Thus,  in  seeking  solutions  in  this  area,  larger  amounts  of  knowledge 
have  since  been  utilized.  The  net  result  has  been  that  the  “Knowledge  Engineering”  methodology 
used  for  Expert  Systems  has  been  adapted  for  use  in  problem-solving  and  planning.  Thus,  the 
boundary  between  problem-solving  and  planning  and  expert  systems  has  faded  and  it  is  now  com¬ 
mon  to  refer  to  all  these  knowledge-based  activities  as  expert  systems  and  are  therefore  covered  in 
that  volume  of  this  series.  Nevertheless,  this  chapter  will  briefly  review  some  of  the  earlier  less 
knowledge-intensive  systems  and  several  examples  of  recent  systems. 

B.  Planning  Defined 

Most  of  AI  applications  can  be  considered  as  examples  of  problem-solving,  which  are  well 
covered  in  the  other  AI  application  areas:  Expert  Systems,  Computer  Vision,  Language 
Understanding,  etc.  In  this  chapter  we  will  only  consider  planning  systems.  Planning  can  be 
defined  for  our  purposes  as  the  design  process  for  selecting  and  stringing  together  individual 
actions  into  sequences  in  order  to  achieve  desired  goals. 

C.  Basic  Planning  Paradigm 

Wilensky  (1983)  outlines  the  basic  structure  of  plans  from  the  viewpoint  of  common-sense 
problem  solving  and  natural  language  understanding.  A  schematic  for  Wilensky’s  basic  planning 
paradigm  is  given  in  Figure  VI- 1.  In  this  paradigm,  the  planner  recognizes  from  the  environment 
that  a  new  situation  has  arisen  which  merits  a  goal.  The  planner  then  retrieves  from  memory  a 
plan  that  might  be  used  to  achieve  this  goal,  or  generates  a  new  trial  plan  if  no  existing  plan  is 
suitable.  This  candidate  plan  is  then  projected  forward  (via  simulation)  to  observe  the  outcome. 
This  outcome  is  examined  to  see  if  there  are  any  conflicts  that  will  arise  in  achieving  other  goals  if 
this  plan  is  pursued.  If  not,  this  and  other  candidate  plan  outcomes  are  evaluated  and  the 
maximum-valued  plan  is  chosen.  The  plan,  when  implemented,  will  modify  the  current  state-of- 
affairs.  This  impact,  together  with  any  other  changes  in  the  environment,  results  in  a  new  world 
model  with  new  situations  that  may  merit  new  goals,  so  that  the  cyclic  process  of  planning  con¬ 
tinues.  When  candidate  plans  are  being  considered,  if  the  candidate  plan  overlaps  existing  plans 
for  other  goals,  these  overlapping  plans  may  be  merged  to  conserve  resources. 

A  basic  problem  in  planning  is  that  of  conflicting  goals.  The  causes  of  conflicting  goals  are  in¬ 
dicated  in  Figure  VI-2.  (A  preservation  goal  is  a  goal  to  preserve  an  already  existing  condition,  or 
is  a  goal  not  to  undo  a  desirable  state  or  goal  resulting  from  another  plan.) 
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Figure  VI-1.  Wilensky  Planning  Paradigm. 
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Figure  VI-2.  Nature  of  Goal  Conflicts. 


Problems  arising  from  conflicting  goals  are  dealt  with  by  replanning  or  by  eliminating  the  fac¬ 
tors  causing  the  goal  conflicts.  A  flow  diagram  for  resolving  goal  conflicts  is  given  in  Figure 
VI-3.  If  the  goal  conflicts  cannot  be  completely  resolved,  then  partial  fulfillment  of  goals  may  be 
attempted  or  goals  of  lesser  importance  may  have  to  be  dropped.  The  global  strategy  is  to  achieve 
as  many  goals  as  possible,  maximizing  the  composite  value  of  the  goals  achieved,  and  not  waste 
resources  in  achieving  them. 

DEVISER  (Vere,  1983)  is  a  good  example  of  a  planning  program  designed  to  deal  with  conflict¬ 
ing  goals  resulting  from  resource  and  time  constraints. 

Wilensky  also  discusses  “competing  goals”  that  arise  in  competitive  situations.  The  planning 
strategies  given  in  this  case  are  to: 

1)  Avoid  conflicts 

2)  Outdo  an  opponent 

3)  Hinder  an  opponent 

4)  Induce  alterations  in  competitive  plans. 

D.  Paradigms  for  Generating  Plans 

The  major  issue  in  any  planning  system  is  reducing  search.  The  other  key  issue  is  how  to  handle 
interacting  subproblems.  The  following  paradigms  are  different  approaches  to  addressing  these 
issues. 

Cohen  and  Feigenbaum  (1982)  discuss  four  distinct  approaches  to  planning:  nonhierarchical, 
hierarchical,  script-based  (skeletal)  and  opportunistic.  Virtually  all  plans,  both  hierarchical  and 
nonhierarchical,  have  hierarchical  subgoal  structures.  That  is,  each  goal  can  be  expanded  into 
several  subgoals,  which  themselves  can  be  further  expanded,  etc.  until  the  bottom  level  consists  of 
operators  needed  to  achieve  the  lowest  level  goals.  The  distinction  between  hierarchical  and 
nonhierarchical  planners  is  that  “.  .  .a  hierarchical  planner  generates  a  hierarchy  of  representa¬ 
tions  of  a  plan  in  which  the  highest  is  a  simplification,  or  abstraction  of  the  plan  and  the  lowest  is 
a  detailed  plan,  sufficient  to  solve  the  problem.  In  contrast,  nonhierarchical  planners  have  only 
one  representation  of  a  plan.”  (pp.  516-517) 

1.  Nonhierarchical  Planning 

Nonhierarchical  planning  does  not  initially  distinguish  between  important  and  unimportant  ac¬ 
tions  so  that  everything  is  considered  in  the  initial  plan,  including  cumbersome  details.  For  com¬ 
plex  problems,  this  often  results  in  a  large  search.  One  way  the  search  can  be  greatly  reduced  is  by 
initially  assuming  subgoals  independent  and  then  trying  to  repair  the  plan  to  account  for  the 
interactions  (as  in  HACKER,  Table  VI-1-2). 

A  knowledge  based  approach  used  in  ISIS-II  (Fox  et  al.,  1982)  is  to  prune  the  search  space  prior 
to  search  by  using  constraints,  and  then  narrow  the  space  actually  searched  by  using  a  “beam 
search”  approach. 

2.  Hierarchical  Planning 

In  this  approach,  first  a  high  level  plan  is  formulated  considering  only  the  important  aspects, 
then  the  vague  parts  of  the  plan  are  refined  into  more  detailed  subplans.  By  ignoring  the  details  at 
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Figure  VI-3.  Resolving  Conflicting  Goals  by  Replanning  (and/or  Attacking  Factors  Causing 
Conflicts). 
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the  higher  levels,  search  is  vastly  reduced.  ABSTRIPS  (Table  VI-1-5)  is  illustrative  of  this 
approach. 

3.  Utilization  of  Skeleton  Plans 

This  approach  utilizes  stored  plans  which  contain  the  outlines  for  solving  many  different  kinds 
of  problems.  The  skeleton  plans  are  then  filled  in  for  the  particular  problem  being  solved.  This 
technique  has  similarities  to  Schank’s  script-based  approach  to  language  understanding.  KNOBS 
(Engelman  et  al.,  1980,  Table  VI-1-10),  a  frame-based  planning  system  for  tactical  air  strikes,  is 
an  example  of  a  skeletal  plan  approach. 

4.  Opportunistic  Planning 

Opportunistic  planning  (Hayes  Roth  and  Hayes  Roth,  1978)  is  based  on  the  way  that  humans 
often  approach  planning.  In  this  approach,  the  plan  is  developed  piecewise,  with  parts  of  the  plan 
being  developed  separately,  and  then  added  to,  enlarged  and  linked  together  as  opportunities  pre¬ 
sent  themselves.  Planning  of  this  sort  incorporates  both  top-down  and  bottom-up  components. 

E.  Planners 

In  this  section  we  summarize  the  characteristics  of  some  of  the  key  AI  planning  systems  that 
have  evolved  over  the  years.  Figure  VI-4  diagrams  the  various  systems  that  are  reviewed  and  their 
relation  to  the  basic  paradigms.  Tables  VI-1  outline  the  systems  shown  in  Figure  VI-4,  using  the 
Expert  Systems  format  (Figure  1-1)  developed  in  Chapter  I.  Note  that  planners  evolve  by  building 
on  past  techniques.  For  example,  DEVISER  (Table  VI-1-9),  the  first  planner  to  deal  explicitly 
with  time,  is  based  on  NOAH  (Table  VI-1-4),  with  facilities  having  been  added  to  keep  track  of 
event  “windows”  and  durations.  Figure  VI-5  presents  a  simplified  flow  chart  of  Deviser’s  core 
planning  component. 

Information  on  current  research  in  planning  is  given  in  Robinson  (1983). 

F.  Trends 

Automatic  Planning  is  still  a  difficult  task.  The  current  trend  is  toward  the  use  of  knowledge 
engineering  to  configure  planners  as  expert  systems.  Thus,  knowledge-based  planners  are  in¬ 
cluded,  and  further  discussed,  in  the  volume  on  expert  systems. 

Another  trend  is  toward  increased  concern  with  spatial-temporal  planning.  This  is  exemplified 
by  Malik  and  Binford  (1983),  Allen  and  Koomen  (1983)  and  Brooks  (1983). 
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TABLE  VI-1-1.  Planners. 
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TABLE  VI-1-2.  Planners. 
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cedure  is  written  using 
the  programming  tech¬ 
niques  library. 
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TABLE  VI-1-4.  Planners. 
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TABLE  VI-1-5.  Planners. 
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TABLE  VI-1-6.  Planners. 
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TABLE  VI-1-7.  Planners. 
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TABLE  VI- 1-7.  Planners,  (cont.) 
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is  encountered. 


TABLE  VI-1-8.  Planners. 
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TABLE  VI-1-9.  Planners. 
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TABLE  VI-1-9.  Planners,  (cont.) 


TABLE  VI-1-10.  Planners. 
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TABLE  VI-1-11.  Planners. 
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i.  Perform  post-search 
analysis  to  determine 
if  search  was  effective. 


BACKTRACK 
TO  LAST  CHOICE 
POINT  AND  TRY 
ANOTHER  ALTERNATIVE 
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Figure  VI-5.  Simplified  Flow  Chart  of  Deviser’s  Core  Planner. 
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